webinar - big data: let's smack - jorg schad
TRANSCRIPT
![Page 1: Webinar - Big Data: Let's SMACK - Jorg Schad](https://reader030.vdocuments.mx/reader030/viewer/2022021422/5a6479a47f8b9a27568b48fd/html5/thumbnails/1.jpg)
Big Data: Let's SMACK @joerg_schad
![Page 2: Webinar - Big Data: Let's SMACK - Jorg Schad](https://reader030.vdocuments.mx/reader030/viewer/2022021422/5a6479a47f8b9a27568b48fd/html5/thumbnails/2.jpg)
You can come and see me at Codemotion Milan 2017!
Talk: NO ONE PUTS Java IN THE
CONTAINER November 10th, 15:10-15.50
http://milan2017.codemotionworld.com/
![Page 3: Webinar - Big Data: Let's SMACK - Jorg Schad](https://reader030.vdocuments.mx/reader030/viewer/2022021422/5a6479a47f8b9a27568b48fd/html5/thumbnails/3.jpg)
© 2017 Mesosphere, Inc. All Rights Reserved. 3
Jörg SchadSoftware Engineer @Mesosphere
@joerg_schad
@joerg.mesosphere
![Page 4: Webinar - Big Data: Let's SMACK - Jorg Schad](https://reader030.vdocuments.mx/reader030/viewer/2022021422/5a6479a47f8b9a27568b48fd/html5/thumbnails/4.jpg)
© 2017 Mesosphere, Inc. All Rights Reserved. 4
MapReduce is crunching Data
Ancient Times...
![Page 5: Webinar - Big Data: Let's SMACK - Jorg Schad](https://reader030.vdocuments.mx/reader030/viewer/2022021422/5a6479a47f8b9a27568b48fd/html5/thumbnails/5.jpg)
© 2016 Mesosphere, Inc. All Rights Reserved. 5
But then business demanded FAST DATA
We need to turn faster!
Today...
![Page 6: Webinar - Big Data: Let's SMACK - Jorg Schad](https://reader030.vdocuments.mx/reader030/viewer/2022021422/5a6479a47f8b9a27568b48fd/html5/thumbnails/6.jpg)
Batch Event ProcessingMicro-Batch
Days Hours Minutes Seconds Microseconds
Solves problems using predictive and prescriptive analyticsReports what has happened using descriptive analytics
Predictive User InterfaceReal-time Pricing and Routing Real-time AdvertisingBilling, Chargeback Product recommendations
(Fast) Data Processing
![Page 7: Webinar - Big Data: Let's SMACK - Jorg Schad](https://reader030.vdocuments.mx/reader030/viewer/2022021422/5a6479a47f8b9a27568b48fd/html5/thumbnails/7.jpg)
Fast Data Pipeline
EVENTS
Ubiquitous data streams from connected devices
INGEST STOREANALYZE ACT
Ingest millions of events per second
Distributed & highly scalable database
Real-time and batch process data
Visualize data and build data driven
applications
![Page 8: Webinar - Big Data: Let's SMACK - Jorg Schad](https://reader030.vdocuments.mx/reader030/viewer/2022021422/5a6479a47f8b9a27568b48fd/html5/thumbnails/8.jpg)
The SMACK Stack
EVENTS
Ubiquitous data streams from connected devices
INGEST
Apache Kafka
STORE
Apache Spark
ANALYZE
Apache Cassandra
ACT
Akka
Ingest millions of events per second
Distributed & highly scalable database
Real-time and batch process data
Visualize data and build data driven
applications
Apache Mesos
Sensors
Devices
Clients
![Page 9: Webinar - Big Data: Let's SMACK - Jorg Schad](https://reader030.vdocuments.mx/reader030/viewer/2022021422/5a6479a47f8b9a27568b48fd/html5/thumbnails/9.jpg)
© 2017 Mesosphere, Inc. All Rights Reserved. 9
Datacenter
![Page 10: Webinar - Big Data: Let's SMACK - Jorg Schad](https://reader030.vdocuments.mx/reader030/viewer/2022021422/5a6479a47f8b9a27568b48fd/html5/thumbnails/10.jpg)
NAIVE APPROACH
Typical Datacentersiloed, over-provisioned servers,
low utilization
Industry Average 12-15% utilization
mySQL
microservice
Cassandra
Spark/Hadoop
Kafka
![Page 11: Webinar - Big Data: Let's SMACK - Jorg Schad](https://reader030.vdocuments.mx/reader030/viewer/2022021422/5a6479a47f8b9a27568b48fd/html5/thumbnails/11.jpg)
© 2017 Mesosphere, Inc. All Rights Reserved. 11
![Page 12: Webinar - Big Data: Let's SMACK - Jorg Schad](https://reader030.vdocuments.mx/reader030/viewer/2022021422/5a6479a47f8b9a27568b48fd/html5/thumbnails/12.jpg)
MULTIPLEXING OF DATA, SERVICES, USERS, ENVIRONMENTS
Typical Datacentersiloed, over-provisioned servers,
low utilization
Mesos/ DC/OSautomated schedulers, workload multiplexing onto the
same machines
mySQL
microservice
Cassandra
Spark/Hadoop
Kafka
![Page 13: Webinar - Big Data: Let's SMACK - Jorg Schad](https://reader030.vdocuments.mx/reader030/viewer/2022021422/5a6479a47f8b9a27568b48fd/html5/thumbnails/13.jpg)
Apache Mesos• A top-level Apache project• A cluster resource negotiator• Scalable to 10,000s of nodes• Fault-tolerant, battle-tested• An SDK for distributed apps• Native Docker support
![Page 14: Webinar - Big Data: Let's SMACK - Jorg Schad](https://reader030.vdocuments.mx/reader030/viewer/2022021422/5a6479a47f8b9a27568b48fd/html5/thumbnails/14.jpg)
MESOS: FUNDAMENTAL ARCHITECTURE
Mesos Master
Mesos Master
Mesos Master
Mesos AgentMesos Agent ServiceCassandra Executor
Cassandra Task
Cassandra Scheduler
Container Scheduler
Spark Scheduler
Spark Executor
Spark Task
Mesos AgentMesos Agent ServiceDocker
Executor
Docker Task
Spark Executor
Spark Task
Two-level Scheduling1. Agents advertise resources to Master2. Master offers resources to Framework3. Framework rejects / uses resources4. Agent reports task status to Master
![Page 15: Webinar - Big Data: Let's SMACK - Jorg Schad](https://reader030.vdocuments.mx/reader030/viewer/2022021422/5a6479a47f8b9a27568b48fd/html5/thumbnails/15.jpg)
© 2017 Mesosphere, Inc. All Rights Reserved. 15
![Page 16: Webinar - Big Data: Let's SMACK - Jorg Schad](https://reader030.vdocuments.mx/reader030/viewer/2022021422/5a6479a47f8b9a27568b48fd/html5/thumbnails/16.jpg)
Datacenter Operating System (DC/OS)
Distributed Systems Kernel (Mesos)
DC/OS ENABLES MODERN DISTRIBUTED APPS
Big Data + Analytics EnginesMicroservices (in containers)
Streaming
Batch
Machine Learning
Analytics
Functions & Logic
Search
Time Series
SQL / NoSQL
Databases
Modern App Components
Any Infrastructure (Physical, Virtual, Cloud)
![Page 17: Webinar - Big Data: Let's SMACK - Jorg Schad](https://reader030.vdocuments.mx/reader030/viewer/2022021422/5a6479a47f8b9a27568b48fd/html5/thumbnails/17.jpg)
The SMACK Stack
EVENTS
Ubiquitous data streams from connected devices
INGEST
Apache Kafka
STORE
Apache Spark
ANALYZE
Apache Cassandra
ACT
Akka
Ingest millions of events per second
Distributed & highly scalable database
Real-time and batch process data
Visualize data and build data driven
applications
Apache Mesos
Sensors
Devices
Clients
![Page 18: Webinar - Big Data: Let's SMACK - Jorg Schad](https://reader030.vdocuments.mx/reader030/viewer/2022021422/5a6479a47f8b9a27568b48fd/html5/thumbnails/18.jpg)
The SMACK Stack
EVENTS
Ubiquitous data streams from connected devices
INGEST STOREANALYZE ACT
Ingest millions of events per second
Distributed & highly scalable database
Real-time and batch process data
Visualize data and build data driven
applications
![Page 19: Webinar - Big Data: Let's SMACK - Jorg Schad](https://reader030.vdocuments.mx/reader030/viewer/2022021422/5a6479a47f8b9a27568b48fd/html5/thumbnails/19.jpg)
The SMACK Stack
EVENTS
Ubiquitous data streams from connected devices
INGEST STOREANALYZE ACT
Ingest millions of events per second
Distributed & highly scalable database
Real-time and batch process data
Visualize data and build data driven
applications
![Page 20: Webinar - Big Data: Let's SMACK - Jorg Schad](https://reader030.vdocuments.mx/reader030/viewer/2022021422/5a6479a47f8b9a27568b48fd/html5/thumbnails/20.jpg)
DATA PROCESSING AT HYPERSCALE
EVENTS
Ubiquitous data streams from connected devices
INGEST
Apache Kafka
STORE
Apache Spark
ANALYZE
Apache Cassandra
ACT
Akka
Ingest millions of events per second
Distributed & highly scalable database
Real-time and batch process data
Visualize data and build data driven
applications
DC/OS
Sensors
Devices
Clients
![Page 21: Webinar - Big Data: Let's SMACK - Jorg Schad](https://reader030.vdocuments.mx/reader030/viewer/2022021422/5a6479a47f8b9a27568b48fd/html5/thumbnails/21.jpg)
MESSAGE QUEUES
Apache Kafka ØMQ, RabbitMQ, Disque (Redis-based), etc. fluentd, Logstash, Flume Akka streams cloud-only:
AWS SQS Google Cloud Pub/Sub
![Page 22: Webinar - Big Data: Let's SMACK - Jorg Schad](https://reader030.vdocuments.mx/reader030/viewer/2022021422/5a6479a47f8b9a27568b48fd/html5/thumbnails/22.jpg)
APACHE KAFKA
High-throughput, distributed, persistent publish-subscribe messaging system
Originates from LinkedIn Typically used as buffer/de-coupling
layer in online stream processing
![Page 23: Webinar - Big Data: Let's SMACK - Jorg Schad](https://reader030.vdocuments.mx/reader030/viewer/2022021422/5a6479a47f8b9a27568b48fd/html5/thumbnails/23.jpg)
fluentd
![Page 24: Webinar - Big Data: Let's SMACK - Jorg Schad](https://reader030.vdocuments.mx/reader030/viewer/2022021422/5a6479a47f8b9a27568b48fd/html5/thumbnails/24.jpg)
© 2016 Mesosphere, Inc. All Rights Reserved. 24
● Scalability● Message Type● Log vs …
● Delivery Guarantees/Message durability
● Routing Capabilities● Failover● Community● Mesos Support ;-)
HOW TO CHOOSE?
![Page 25: Webinar - Big Data: Let's SMACK - Jorg Schad](https://reader030.vdocuments.mx/reader030/viewer/2022021422/5a6479a47f8b9a27568b48fd/html5/thumbnails/25.jpg)
DELIVERY GUARANTEES
At most once—Messages may be lost but are never redelivered.
At least once—Messages are never lost but may be redelivered.
Exactly once—this is what people actually want, each message is delivered once and only once.
Murphy’s Law of Distributed Systems:
Anything that can go wrong, will go wrong … partially!
![Page 26: Webinar - Big Data: Let's SMACK - Jorg Schad](https://reader030.vdocuments.mx/reader030/viewer/2022021422/5a6479a47f8b9a27568b48fd/html5/thumbnails/26.jpg)
RoutingSimple Pipes Routing
![Page 27: Webinar - Big Data: Let's SMACK - Jorg Schad](https://reader030.vdocuments.mx/reader030/viewer/2022021422/5a6479a47f8b9a27568b48fd/html5/thumbnails/27.jpg)
DATA PROCESSING AT HYPERSCALE
EVENTS
Ubiquitous data streams from connected devices
INGEST
Apache Kafka
STORE
Apache Spark
ANALYZE
Apache Cassandra
ACT
Akka
Ingest millions of events per second
Distributed & highly scalable database
Real-time and batch process data
Visualize data and build data driven
applications
DC/OS
Sensors
Devices
Clients
![Page 28: Webinar - Big Data: Let's SMACK - Jorg Schad](https://reader030.vdocuments.mx/reader030/viewer/2022021422/5a6479a47f8b9a27568b48fd/html5/thumbnails/28.jpg)
STREAM PROCESSING• Apache Storm • Apache Spark • Apache Samza • Apache Flink • Apache Apex • Concord • cloud-only: AWS Kinesis,
Google Cloud Dataflow
![Page 29: Webinar - Big Data: Let's SMACK - Jorg Schad](https://reader030.vdocuments.mx/reader030/viewer/2022021422/5a6479a47f8b9a27568b48fd/html5/thumbnails/29.jpg)
© 2016 Mesosphere, Inc. All Rights Reserved. 29
APACHE SPARK
![Page 30: Webinar - Big Data: Let's SMACK - Jorg Schad](https://reader030.vdocuments.mx/reader030/viewer/2022021422/5a6479a47f8b9a27568b48fd/html5/thumbnails/30.jpg)
APACHE SPARK (STREAMING)
Typical Use: distributed, large-scale data processing; micro-batchingWhy Spark Streaming?
• Micro-batching creates very low latency, which can be faster
• Well defined role means it fits in well with other pieces of the pipeline
![Page 31: Webinar - Big Data: Let's SMACK - Jorg Schad](https://reader030.vdocuments.mx/reader030/viewer/2022021422/5a6479a47f8b9a27568b48fd/html5/thumbnails/31.jpg)
© 2016 Mesosphere, Inc. All Rights Reserved. 31
● Execution Model● Native Streaming vs Microbatch
● Fault Tolerance Granularity● Per record, per batch
● Delivery Guarantees● API
● SQL● Spark
● Performance….● Realtime ≠ Realtime
● Community● Mesos Support ;-)
HOW TO CHOOSE?
![Page 32: Webinar - Big Data: Let's SMACK - Jorg Schad](https://reader030.vdocuments.mx/reader030/viewer/2022021422/5a6479a47f8b9a27568b48fd/html5/thumbnails/32.jpg)
EXECUTION MODELMicro-Batching Native
Streaming
![Page 33: Webinar - Big Data: Let's SMACK - Jorg Schad](https://reader030.vdocuments.mx/reader030/viewer/2022021422/5a6479a47f8b9a27568b48fd/html5/thumbnails/33.jpg)
FAULT TOLERANCECheckpoint per “Batch”Ack-Per-Record Checkpoint per Batch
![Page 34: Webinar - Big Data: Let's SMACK - Jorg Schad](https://reader030.vdocuments.mx/reader030/viewer/2022021422/5a6479a47f8b9a27568b48fd/html5/thumbnails/34.jpg)
DELIVERY GUARANTEES“Exactly once”At least Once
![Page 35: Webinar - Big Data: Let's SMACK - Jorg Schad](https://reader030.vdocuments.mx/reader030/viewer/2022021422/5a6479a47f8b9a27568b48fd/html5/thumbnails/35.jpg)
DATA PROCESSING AT HYPERSCALE
EVENTS
Ubiquitous data streams from connected devices
INGEST
Apache Kafka
STORE
Apache Spark
ANALYZE
Apache Cassandra
ACT
Akka
Ingest millions of events per second
Distributed & highly scalable database
Real-time and batch process data
Visualize data and build data driven
applications
DC/OS
Sensors
Devices
Clients
![Page 36: Webinar - Big Data: Let's SMACK - Jorg Schad](https://reader030.vdocuments.mx/reader030/viewer/2022021422/5a6479a47f8b9a27568b48fd/html5/thumbnails/36.jpg)
Datastores
![Page 37: Webinar - Big Data: Let's SMACK - Jorg Schad](https://reader030.vdocuments.mx/reader030/viewer/2022021422/5a6479a47f8b9a27568b48fd/html5/thumbnails/37.jpg)
Data ModelKey-Value GraphRelational Document
● Schema
● SQL
● Foreign Keys/Joins
● OLTP/OLAP
● Simple
● Scalable
● Cache
FilesTime-Series● Complex
relations
● Social Graph
● Recommendation
● Fraud detections
● Schema-Less
● Semi-structured queries
● Product catalogue
● Session data
![Page 38: Webinar - Big Data: Let's SMACK - Jorg Schad](https://reader030.vdocuments.mx/reader030/viewer/2022021422/5a6479a47f8b9a27568b48fd/html5/thumbnails/38.jpg)
Demo Time
Generator Display
1. Financial data created by generator
2. Written to Kafka topics
3. Kafka Topics consumed by Flink
4. Flink pipeline operates on Kafka data
5. Results written back into Kafka stream (another topic)
6. Results displayed
![Page 39: Webinar - Big Data: Let's SMACK - Jorg Schad](https://reader030.vdocuments.mx/reader030/viewer/2022021422/5a6479a47f8b9a27568b48fd/html5/thumbnails/39.jpg)
© 2017 Mesosphere, Inc. All Rights Reserved. 39
Keep it running!
![Page 40: Webinar - Big Data: Let's SMACK - Jorg Schad](https://reader030.vdocuments.mx/reader030/viewer/2022021422/5a6479a47f8b9a27568b48fd/html5/thumbnails/40.jpg)
SERVICE OPERATIONS
● Configuration Updates (ex: Scaling, re-configuration) ● Binary Upgrades ● Cluster Maintenance (ex: Backup, Restore, Restart) ● Monitor progress of operations ● Debug any runtime blockages
![Page 41: Webinar - Big Data: Let's SMACK - Jorg Schad](https://reader030.vdocuments.mx/reader030/viewer/2022021422/5a6479a47f8b9a27568b48fd/html5/thumbnails/41.jpg)
CODEMOTION MILAN November 10/11th,2017http://milan2017.codemotionworld.com/
See you at