harnessing the power of spark and cassandra in your spring app

93
Harnessing the Power of Spark + Cassandra within your Spring App Steve Pember CTO, ThirdChannel @svpember

Upload: steve-pember

Post on 29-Jan-2018

193 views

Category:

Software


0 download

TRANSCRIPT

Page 1: Harnessing the power of Spark and Cassandra in your Spring app

Harnessing the Power of Spark + Cassandra within

your Spring AppSteve Pember

CTO, ThirdChannel @svpember

Page 2: Harnessing the power of Spark and Cassandra in your Spring app

@spring_io #springio17

RELATIONAL DATABASES ARE FANTASTIC

Page 3: Harnessing the power of Spark and Cassandra in your Spring app

SQL MAKES YOU STRONG

Page 4: Harnessing the power of Spark and Cassandra in your Spring app
Page 5: Harnessing the power of Spark and Cassandra in your Spring app
Page 6: Harnessing the power of Spark and Cassandra in your Spring app
Page 7: Harnessing the power of Spark and Cassandra in your Spring app
Page 8: Harnessing the power of Spark and Cassandra in your Spring app

@spring_io #springio17

Page 9: Harnessing the power of Spark and Cassandra in your Spring app

@spring_io #springio17

Agenda

• Spark • Cassandra • Spark + Cassandra • Working with Spark + Cassandra • Demo

Page 10: Harnessing the power of Spark and Cassandra in your Spring app

@spring_io #springio17

Apache Spark

• Distributed Execution Engine

Page 11: Harnessing the power of Spark and Cassandra in your Spring app
Page 12: Harnessing the power of Spark and Cassandra in your Spring app

@spring_io #springio17

Apache Spark

• Distributed Execution Engine

• What about Hadoop?

Page 13: Harnessing the power of Spark and Cassandra in your Spring app

@spring_io #springio17

Hadoop Spark

• Map / Reduce • Storage via HDFS • Each calculation

step written to disk

• More than Map/Reduce

• No dependent storage mechanism

• Clustered Calculations, each step in memory

Page 14: Harnessing the power of Spark and Cassandra in your Spring app

@spring_io #springio17

Apache Spark

• Distributed Execution Engine

• What about Hadoop?

• Creation was a Happy Accident

Page 15: Harnessing the power of Spark and Cassandra in your Spring app

@spring_io #springio17

Page 16: Harnessing the power of Spark and Cassandra in your Spring app

@spring_io #springio17

Page 17: Harnessing the power of Spark and Cassandra in your Spring app

@spring_io #springio17

Apache Spark

• Distributed Execution Engine

• What about Hadoop?

• Creation was a Happy Accident

• Architecture

Page 18: Harnessing the power of Spark and Cassandra in your Spring app

@spring_io #springio17

Page 19: Harnessing the power of Spark and Cassandra in your Spring app

@spring_io #springio17

Your Spring App

Page 20: Harnessing the power of Spark and Cassandra in your Spring app

@spring_io #springio17

Apache Spark

• Distributed Execution Engine

• What about Hadoop?

• Creation was a Happy Accident

• Architecture

• Programatic structure

Page 21: Harnessing the power of Spark and Cassandra in your Spring app

@spring_io #springio17

THE SPARKCONTEXT SUBMITS JOBS TO THE CLUSTER

Page 22: Harnessing the power of Spark and Cassandra in your Spring app

@spring_io #springio17

OPERATIONS ARE PERFORMED AGAINST RDDS

Page 23: Harnessing the power of Spark and Cassandra in your Spring app

@spring_io #springio17

Resilient Distributed Dataset

• Immutable • Partitioned • Parallel operations • Created by performing operations on

other RDDs • Reusable & Composable

Page 24: Harnessing the power of Spark and Cassandra in your Spring app

@spring_io #springio17

Page 25: Harnessing the power of Spark and Cassandra in your Spring app

@spring_io #springio17

Apache Spark

• Distributed Execution Engine

• What about Hadoop?

• Creation was a Happy Accident

• Architecture

• Programatic structure

• APIs

Page 26: Harnessing the power of Spark and Cassandra in your Spring app

@spring_io #springio17

MORE THAN MAP/REDUCE

Page 27: Harnessing the power of Spark and Cassandra in your Spring app

@spring_io #springio17

RDD operations

• map • reduce • aggregate • filter • flatmap • join • … plus many more

Page 28: Harnessing the power of Spark and Cassandra in your Spring app

@spring_io #springio17

Page 29: Harnessing the power of Spark and Cassandra in your Spring app

@spring_io #springio17

Apache Spark

• Distributed Execution Engine

• What about Hadoop?

• Creation was a Happy Accident

• Architecture

• Programatic structure

• APIs

• Additional Modules

Page 30: Harnessing the power of Spark and Cassandra in your Spring app

@spring_io #springio17

SPARK SQL…!

Page 31: Harnessing the power of Spark and Cassandra in your Spring app

@spring_io #springio17

Page 32: Harnessing the power of Spark and Cassandra in your Spring app

@spring_io #springio17

JDBC?

Page 33: Harnessing the power of Spark and Cassandra in your Spring app

@spring_io #springio17

SPARK STREAMING!

Page 34: Harnessing the power of Spark and Cassandra in your Spring app

@spring_io #springio17

Page 35: Harnessing the power of Spark and Cassandra in your Spring app

@spring_io #springio17

Agenda

• Spark

• Cassandra

Page 36: Harnessing the power of Spark and Cassandra in your Spring app

@spring_io #springio17

Apache Cassandra (C*)

• NoSql Datastore

Page 37: Harnessing the power of Spark and Cassandra in your Spring app

@spring_io #springio17

Apache Cassandra (C*)

• NoSql Datastore

• Distributed

Page 38: Harnessing the power of Spark and Cassandra in your Spring app

@spring_io #springio17

DETERMINISTIC DISTRIBUTION

Page 39: Harnessing the power of Spark and Cassandra in your Spring app

@spring_io #springio17

Page 40: Harnessing the power of Spark and Cassandra in your Spring app

@spring_io #springio17

Apache Cassandra (C*)

• NoSql Datastore

• Distributed

• High Replication

Page 41: Harnessing the power of Spark and Cassandra in your Spring app

@spring_io #springio17

Page 42: Harnessing the power of Spark and Cassandra in your Spring app

@spring_io #springio17

Page 43: Harnessing the power of Spark and Cassandra in your Spring app

@spring_io #springio17

Apache Cassandra (C*)

• NoSql Datastore

• Distributed

• High Replication

• High Durability

Page 44: Harnessing the power of Spark and Cassandra in your Spring app

@spring_io #springio17

Page 45: Harnessing the power of Spark and Cassandra in your Spring app

@spring_io #springio17

Apache Cassandra (C*)

• NoSql Datastore

• Distributed

• High Replication

• High Durability

• Linear Scalability

Page 46: Harnessing the power of Spark and Cassandra in your Spring app

@spring_io #springio17

EACH NEW NODE RESULTS IN INCREASED STORAGE WITH NO LOSS IN PERFORMANCE

Page 47: Harnessing the power of Spark and Cassandra in your Spring app

@spring_io #springio17

Page 48: Harnessing the power of Spark and Cassandra in your Spring app

@spring_io #springio17

Apache Cassandra (C*)

• NoSql Datastore

• Distributed

• High Replication

• High Durability

• Linear Scalability

• Data Model (CQL)

Page 49: Harnessing the power of Spark and Cassandra in your Spring app

@spring_io #springio17

COLUMN ORIENTED DATABASE

Page 50: Harnessing the power of Spark and Cassandra in your Spring app

@spring_io #springio17

BUT IT’S SQL-LIKE!

Page 51: Harnessing the power of Spark and Cassandra in your Spring app

@spring_io #springio17

Page 52: Harnessing the power of Spark and Cassandra in your Spring app

@spring_io #springio17

Page 53: Harnessing the power of Spark and Cassandra in your Spring app

@spring_io #springio17

Page 54: Harnessing the power of Spark and Cassandra in your Spring app

@spring_io #springio17

QUERYING

Page 55: Harnessing the power of Spark and Cassandra in your Spring app

@spring_io #springio17

C* Querying

• select * from … • all queries must include partition key(s) • order by limited to group keys

Page 56: Harnessing the power of Spark and Cassandra in your Spring app

@spring_io #springio17

Apache Cassandra (C*)

• NoSql Datastore

• Distributed

• High Replication

• High Durability

• Linear Scalability

• Data Model (CQL)

• Designing your Data Model

Page 57: Harnessing the power of Spark and Cassandra in your Spring app

@spring_io #springio17

Page 58: Harnessing the power of Spark and Cassandra in your Spring app

@spring_io #springio17

Page 59: Harnessing the power of Spark and Cassandra in your Spring app

@spring_io #springio17

Agenda

• Spark

• Cassandra

• Spark + Cassandra

Page 60: Harnessing the power of Spark and Cassandra in your Spring app
Page 61: Harnessing the power of Spark and Cassandra in your Spring app

@spring_io #springio17

Spark + Cassandra

– Reduce each other’s weaknesses – Filter on the server side (with c*) – Join tables, filter results (with Spark)

Page 62: Harnessing the power of Spark and Cassandra in your Spring app

@spring_io #springio17

COMPANIES HAVE BEEN FORMED

Page 63: Harnessing the power of Spark and Cassandra in your Spring app
Page 64: Harnessing the power of Spark and Cassandra in your Spring app

@spring_io #springio17

CLUSTER DESIGN

Page 65: Harnessing the power of Spark and Cassandra in your Spring app

@spring_io #springio17

Page 66: Harnessing the power of Spark and Cassandra in your Spring app

DATA LOCALITY!

Page 67: Harnessing the power of Spark and Cassandra in your Spring app

@spring_io #springio17

Page 68: Harnessing the power of Spark and Cassandra in your Spring app

@spring_io #springio17

Page 69: Harnessing the power of Spark and Cassandra in your Spring app

@spring_io #springio17

PIPELINE ARCHITECTURE

Page 70: Harnessing the power of Spark and Cassandra in your Spring app

@spring_io #springio17

Page 71: Harnessing the power of Spark and Cassandra in your Spring app

@spring_io #springio17

Agenda

• Spark

• Cassandra

• Spark + Cassandra

• Working with Spark + Cassandra

Page 72: Harnessing the power of Spark and Cassandra in your Spring app

@spring_io #springio17

OPTIONS FOR SPRING?

Page 73: Harnessing the power of Spark and Cassandra in your Spring app
Page 74: Harnessing the power of Spark and Cassandra in your Spring app

@spring_io #springio17

Page 75: Harnessing the power of Spark and Cassandra in your Spring app

@spring_io #springio17

BUT WE DIDN’T GO THAT ROUTE

Page 76: Harnessing the power of Spark and Cassandra in your Spring app

@spring_io #springio17

Our Excuses

• Wanted to take full advantage of Spark + C* connector

• Our setup / pipeline is relatively minimal • Programming model is easy

Page 77: Harnessing the power of Spark and Cassandra in your Spring app

@spring_io #springio17

Page 78: Harnessing the power of Spark and Cassandra in your Spring app

@spring_io #springio17

CODING SPARK + C*

Page 79: Harnessing the power of Spark and Cassandra in your Spring app

@spring_io #springio17

• SparkConf • JavaSparkContext • JavaFunctions • Mappers

Page 80: Harnessing the power of Spark and Cassandra in your Spring app

@spring_io #springio17

Page 81: Harnessing the power of Spark and Cassandra in your Spring app

@spring_io #springio17

Spark Conf• spark.master -> url to the master node • spark.app.name -> want to see your client show up in

the Spark UI? • spark.executor.memory -> Limits memory per

executor on workers • spark.executor.cores -> limits cores on each worker

(need to share with c*!) • spark.submit.deployMode -> ‘client’ or ‘cluster • spark.jars.packages -> maven / gradle type names • spark.jars.ivy -> specify custom repos for packages • more at: http://spark.apache.org/docs/latest/

configuration.html#available-properties

Page 82: Harnessing the power of Spark and Cassandra in your Spring app

@spring_io #springio17

Master Url Overloading

• “local” -> use Spark in stand alone mode. One thread

• “local[<K>]” -> Spark, stand alone, with K threads

• “local[*]” -> Spark, stand alone, with ALL YOUR THREADS!

• “spark://<host string>:<port>” -> url for a Spark cluster master node, using Spark’s cluster management

• also options for Mesos and Yarn

Page 83: Harnessing the power of Spark and Cassandra in your Spring app

@spring_io #springio17

Page 84: Harnessing the power of Spark and Cassandra in your Spring app

@spring_io #springio17

HOWEVER, A WARNING

Page 85: Harnessing the power of Spark and Cassandra in your Spring app
Page 86: Harnessing the power of Spark and Cassandra in your Spring app
Page 87: Harnessing the power of Spark and Cassandra in your Spring app

@spring_io #springio17

MOST DIFFICULT PART: WHERE DOES MY CODE LIVE?

Page 88: Harnessing the power of Spark and Cassandra in your Spring app

@spring_io #springio17

Page 89: Harnessing the power of Spark and Cassandra in your Spring app

@spring_io #springio17

CLASS_PATH: org.apache.spark, com.fasterxml.jackson, com.yourco.yourapp.pojos.*

CLASS_PATH: org.apache.spark, com.fasterxml.jackson

CLASS_PATH: org.apache.spark, com.fasterxml.jackson

Page 90: Harnessing the power of Spark and Cassandra in your Spring app

@spring_io #springio17

Agenda

• Spark

• Cassandra

• Spark + Cassandra

• Working with Spark + Cassandra

• Demo

Page 91: Harnessing the power of Spark and Cassandra in your Spring app

Thank You!

@svpember

Page 92: Harnessing the power of Spark and Cassandra in your Spring app

@spring_io #springio17

Links• Cassandra on AWS official Whitepaper: https://d0.awsstatic.com/whitepapers/Cassandra_on_AWS.pdf

• Demo Code project link: https://github.com/spember/spark-cass-spring-demo

Page 93: Harnessing the power of Spark and Cassandra in your Spring app

@spring_io #springio17

Images• Database Sharding: https://dzone.com/articles/ebay-secret-database-scaling

• Indian Jones Warehouse: http://logisticalfictions.tumblr.com/page/9

• Strong (Spongebob): www.reactiongifs.com/strongbob/?utm_source=rss&utm_medium=rss&utm_campaign=strongbob

• Cheetah: www.livescience.com/21944-usain-bolt-vs-cheetah-animal-olympics.html

• Big Data Cartoon: http://www.kdnuggets.com/2016/08/cartoon-make-data-great-again.html

• Spark Streaming: http://velvia.github.io/presentations/2015-filodb-spark-streaming/#/

• Picard + Riker: http://www.douxreviews.com/2015/09/star-trek-next-generation-matter-of.html

• Software Engineers: http://pyxurz.blogspot.com/2011/10/office-space-page-2-of-6.html

• Throwing Money: https://vimeo.com/132892478