real-time log analysis with apache mesos, kafka and cassandra
TRANSCRIPT
![Page 1: Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra](https://reader035.vdocuments.mx/reader035/viewer/2022062823/58cf8bfd1a28abe01d8b68e1/html5/thumbnails/1.jpg)
Real-Time log analysis with Mesos, Docker, Kafka, Spark, Cassandra and Solr at scale
![Page 2: Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra](https://reader035.vdocuments.mx/reader035/viewer/2022062823/58cf8bfd1a28abe01d8b68e1/html5/thumbnails/2.jpg)
whoami
CEO of Elodina http://www.elodina.net/ a big data as a service platform built on top open source software. The Elodina platform enables customers to analyze data streams and programmatically react to the results in real-time. We solve today’s data analytics needs by providing the tools and support necessary to utilize open source technologies. As users, contributors and committers, Elodina also provides support for frameworks that run on Mesos including Apache Kafka, Exhibitor (Zookeeper), Apache Storm, Apache Cassandra and a whole lot more!
Apache Kafka Committer & PMC Member
LinkedIn: http://linkedin.com/in/charmalloc Twitter : @allthingshadoop
2© 2015. All Rights Reserved.
![Page 3: Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra](https://reader035.vdocuments.mx/reader035/viewer/2022062823/58cf8bfd1a28abe01d8b68e1/html5/thumbnails/3.jpg)
1 Intro To Mesos, Kafka, Etc
2 Architecture Overview
3 Breaking it down into pieces
4 Questions?
3© 2015. All Rights Reserved.
![Page 4: Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra](https://reader035.vdocuments.mx/reader035/viewer/2022062823/58cf8bfd1a28abe01d8b68e1/html5/thumbnails/4.jpg)
Apache Mesos
4© 2015. All Rights Reserved.
![Page 5: Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra](https://reader035.vdocuments.mx/reader035/viewer/2022062823/58cf8bfd1a28abe01d8b68e1/html5/thumbnails/5.jpg)
Mesos Papers
Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center http://static.usenix.org/event/nsdi11/tech/full_papers/Hindman_new.pdf
Google Borg - https://research.google.com/pubs/pub43438.html
Google Omega: flexible, scalable schedulers for large compute clusters http://eurosys2013.tudos.org/wp-content/uploads/2013/paper/Schwarzkopf.pdf
5
![Page 6: Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra](https://reader035.vdocuments.mx/reader035/viewer/2022062823/58cf8bfd1a28abe01d8b68e1/html5/thumbnails/6.jpg)
Static Partitioning
6
![Page 7: Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra](https://reader035.vdocuments.mx/reader035/viewer/2022062823/58cf8bfd1a28abe01d8b68e1/html5/thumbnails/7.jpg)
Static Partitioning
7
![Page 8: Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra](https://reader035.vdocuments.mx/reader035/viewer/2022062823/58cf8bfd1a28abe01d8b68e1/html5/thumbnails/8.jpg)
Static Partitioning
8
![Page 9: Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra](https://reader035.vdocuments.mx/reader035/viewer/2022062823/58cf8bfd1a28abe01d8b68e1/html5/thumbnails/9.jpg)
Static Partitioning
9
![Page 10: Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra](https://reader035.vdocuments.mx/reader035/viewer/2022062823/58cf8bfd1a28abe01d8b68e1/html5/thumbnails/10.jpg)
Fine Grained Resource Elasticity
"If people knew how low it really is, we’d all get fired."https://gigaom.com/2013/11/30/the-sorry-state-of-server-utilization-and-the-impending-post-hypervisor-era/
10
![Page 11: Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra](https://reader035.vdocuments.mx/reader035/viewer/2022062823/58cf8bfd1a28abe01d8b68e1/html5/thumbnails/11.jpg)
An operating system for your data center
11
![Page 12: Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra](https://reader035.vdocuments.mx/reader035/viewer/2022062823/58cf8bfd1a28abe01d8b68e1/html5/thumbnails/12.jpg)
EVERYTHING ON MESOS
12
![Page 13: Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra](https://reader035.vdocuments.mx/reader035/viewer/2022062823/58cf8bfd1a28abe01d8b68e1/html5/thumbnails/13.jpg)
How it works
13
![Page 14: Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra](https://reader035.vdocuments.mx/reader035/viewer/2022062823/58cf8bfd1a28abe01d8b68e1/html5/thumbnails/14.jpg)
Marathon
14
https://github.com/mesosphere/marathon
Cluster-wide init and control system for
services in cgroups or docker based on
Apache Mesos
![Page 15: Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra](https://reader035.vdocuments.mx/reader035/viewer/2022062823/58cf8bfd1a28abe01d8b68e1/html5/thumbnails/15.jpg)
Docker on Marathon
{ "id": "basic-3", "cmd": "python3 -m http.server 8080", "cpus": 0.5, "mem": 32.0, "container": { "type": "DOCKER", "docker": { "image": "python:3", "network": "BRIDGE", "portMappings": [ { "containerPort": 8080, "hostPort": 0 } ] } }}
15
![Page 16: Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra](https://reader035.vdocuments.mx/reader035/viewer/2022062823/58cf8bfd1a28abe01d8b68e1/html5/thumbnails/16.jpg)
Apache Kafka
16
![Page 17: Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra](https://reader035.vdocuments.mx/reader035/viewer/2022062823/58cf8bfd1a28abe01d8b68e1/html5/thumbnails/17.jpg)
Kafka papers
Apache Kafka was first open sourced by LinkedIn in 2011Papers
● Building a Replicated Logging System with Apache Kafka http://www.vldb.org/pvldb/vol8/p1654-wang.pdf
● Kafka: A Distributed Messaging System for Log Processing http://research.microsoft.com/en-us/um/people/srikanth/netdb11/netdb11papers/netdb11-final12.pdf
● Building LinkedIn’s Real-time Activity Data Pipeline http://sites.computer.org/debull/A12june/pipeline.pdf
● The Log: What Every Software Engineer Should Know About Real-time Data's Unifying Abstraction http://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying
http://kafka.apache.org/17
![Page 18: Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra](https://reader035.vdocuments.mx/reader035/viewer/2022062823/58cf8bfd1a28abe01d8b68e1/html5/thumbnails/18.jpg)
How Big Data Starts
18
![Page 19: Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra](https://reader035.vdocuments.mx/reader035/viewer/2022062823/58cf8bfd1a28abe01d8b68e1/html5/thumbnails/19.jpg)
More Big Data! More!
19
![Page 20: Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra](https://reader035.vdocuments.mx/reader035/viewer/2022062823/58cf8bfd1a28abe01d8b68e1/html5/thumbnails/20.jpg)
uhhhh
20
![Page 21: Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra](https://reader035.vdocuments.mx/reader035/viewer/2022062823/58cf8bfd1a28abe01d8b68e1/html5/thumbnails/21.jpg)
eeesh
21
![Page 22: Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra](https://reader035.vdocuments.mx/reader035/viewer/2022062823/58cf8bfd1a28abe01d8b68e1/html5/thumbnails/22.jpg)
Kafka de-couples data pipelines
22
![Page 23: Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra](https://reader035.vdocuments.mx/reader035/viewer/2022062823/58cf8bfd1a28abe01d8b68e1/html5/thumbnails/23.jpg)
Distributed Replicated Log
Read & WriteIn real timeAs much as you wantAs fast as your network
23
![Page 24: Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra](https://reader035.vdocuments.mx/reader035/viewer/2022062823/58cf8bfd1a28abe01d8b68e1/html5/thumbnails/24.jpg)
Reference Architecture
24
![Page 25: Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra](https://reader035.vdocuments.mx/reader035/viewer/2022062823/58cf8bfd1a28abe01d8b68e1/html5/thumbnails/25.jpg)
Producers
syslog → Kafka via docker https://hub.docker.com/r/stealthly/syslog/
syslog → Kafka scheduler https://github.com/stealthly/syslog-service
statsd → Kafka scheduler https://github.com/stealthly/statsd-mesos-kafka
system stats collection → Kafka scheduler https://github.com/stealthly/syscol
tailf → Kafka https://github.com/stealthly/go_kafka_client/tree/master/producers/tailf
Any language https://cwiki.apache.org/confluence/display/KAFKA/Clients
25
![Page 26: Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra](https://reader035.vdocuments.mx/reader035/viewer/2022062823/58cf8bfd1a28abe01d8b68e1/html5/thumbnails/26.jpg)
Reference Architecture
26
![Page 28: Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra](https://reader035.vdocuments.mx/reader035/viewer/2022062823/58cf8bfd1a28abe01d8b68e1/html5/thumbnails/28.jpg)
Kafka on Mesos
• smart broker.id assignment.• preservation of broker placement (through constraints and/or
new features).• ability to-do configuration changes.• rolling restarts (for things like configuration changes).• scaling the cluster up and down with automatic, programmatic
and manual options.• smart partition assignment via constraints visa vi roles, resources
and attributes.
28
![Page 29: Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra](https://reader035.vdocuments.mx/reader035/viewer/2022062823/58cf8bfd1a28abe01d8b68e1/html5/thumbnails/29.jpg)
CLI & REST API
• scheduler - starts the scheduler.• broker
– add - adds one more more brokers to the cluster.– update - changes resources, constraints or broker properties one or more brokers.– remove - take a broker out of the cluster.– start - starts a broker up.– stop - this can either a graceful shutdown or will force kill it (./kafka-mesos.sh help stop)
• topic – list - list topics in cluster– add - add new topics in cluster– update - change topics in cluster– rebalance - allows you to rebalance a cluster either by selecting the brokers or topics to
rebalance. Manual assignment is still possible using the Apache Kafka project tools. Rebalance can also change the replication factor on a topic.
• help - ./kafka-mesos.sh help || ./kafka-mesos.sh help {command}
29
![Page 30: Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra](https://reader035.vdocuments.mx/reader035/viewer/2022062823/58cf8bfd1a28abe01d8b68e1/html5/thumbnails/30.jpg)
Reference Architecture
30
![Page 31: Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra](https://reader035.vdocuments.mx/reader035/viewer/2022062823/58cf8bfd1a28abe01d8b68e1/html5/thumbnails/31.jpg)
Schema Avro or ProtoBuff
• https://github.com/stealthly/go_kafka_client/blob/master/syslog/syslog_proto/logline.proto • https://github.com/stealthly/go_kafka_client/blob/master/logline.avsc
logline• line• logtypeid• source• tags (k/v pairs)• timings (k/v pairs)
31
![Page 32: Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra](https://reader035.vdocuments.mx/reader035/viewer/2022062823/58cf8bfd1a28abe01d8b68e1/html5/thumbnails/32.jpg)
Consume from Kafka → Write to Cassandra
Implement CQL write here https://github.com/stealthly/go_kafka_client/blob/master/consumers/consumers.go#L186-L194 with https://github.com/gocql/gocql
Go Kafka Client does fan out work processing, rebalance doesn’t upset consumers that are reading already.
32
![Page 33: Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra](https://reader035.vdocuments.mx/reader035/viewer/2022062823/58cf8bfd1a28abe01d8b68e1/html5/thumbnails/33.jpg)
Reference Architecture
33
![Page 34: Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra](https://reader035.vdocuments.mx/reader035/viewer/2022062823/58cf8bfd1a28abe01d8b68e1/html5/thumbnails/34.jpg)
Sample Spark Job → Cassandra
https://github.com/stealthly/gauntlet
Uses the Cassandra Spark Connector https://github.com/datastax/spark-cassandra-connector
34
![Page 35: Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra](https://reader035.vdocuments.mx/reader035/viewer/2022062823/58cf8bfd1a28abe01d8b68e1/html5/thumbnails/35.jpg)
Use DataStax Enterprise to enable Search
http://docs.datastax.com/en/datastax_enterprise/4.8/datastax_enterprise/srch/srchOverview.html
35
![Page 37: Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra](https://reader035.vdocuments.mx/reader035/viewer/2022062823/58cf8bfd1a28abe01d8b68e1/html5/thumbnails/37.jpg)
Thank you