![Page 1: DECK36 - Log everything! and Realtime Datastream Analytics with Storm](https://reader036.vdocuments.mx/reader036/viewer/2022081414/54b6df254a7959f4118b464f/html5/thumbnails/1.jpg)
1Dr. Stefan Schadwinkel und Mike Lohmann
![Page 2: DECK36 - Log everything! and Realtime Datastream Analytics with Storm](https://reader036.vdocuments.mx/reader036/viewer/2022081414/54b6df254a7959f4118b464f/html5/thumbnails/2.jpg)
22
Who we are.
Log everything
Mike LohmannArchitektur
Author (PHPMagazin, IX, heise.de)
Dr. Stefan SchadwinkelAnalytics
Author (heise.de, Cereb.Cortex, EJN, J.Neurophysiol.)
![Page 3: DECK36 - Log everything! and Realtime Datastream Analytics with Storm](https://reader036.vdocuments.mx/reader036/viewer/2022081414/54b6df254a7959f4118b464f/html5/thumbnails/3.jpg)
33
Agenda.
Log everything
What we did. What we do.
Log everything! - Our way from Requirement to Solution
Infrastructure and technologies: Simple, Scalable, Open Source
Happy business users.
![Page 4: DECK36 - Log everything! and Realtime Datastream Analytics with Storm](https://reader036.vdocuments.mx/reader036/viewer/2022081414/54b6df254a7959f4118b464f/html5/thumbnails/4.jpg)
44
What we did.
Log everything
Creating & operating education communities
Webapplications
Multi-language
Different market rules in different countries
Consolidating the technological basis for multiple (new) products
![Page 5: DECK36 - Log everything! and Realtime Datastream Analytics with Storm](https://reader036.vdocuments.mx/reader036/viewer/2022081414/54b6df254a7959f4118b464f/html5/thumbnails/5.jpg)
55
DECK36 GmbH & Co. KG
Log everything
DECK36 is a young spin-off from ICANS
7 core engineers with longstanding expertise
(operate, scale, automate, analyze)
Consulting and engineering services for the
etruvian group and external customers
![Page 6: DECK36 - Log everything! and Realtime Datastream Analytics with Storm](https://reader036.vdocuments.mx/reader036/viewer/2022081414/54b6df254a7959f4118b464f/html5/thumbnails/6.jpg)
66
Numberfacts of PokerStrategy.com
Log everything
6.000.000 Registered Users
PokerStrategy.comEducation since 2005
19 Languages
2.800.000PI/Day
700.000Posts/Day
7.600.000 Requests/Day
![Page 7: DECK36 - Log everything! and Realtime Datastream Analytics with Storm](https://reader036.vdocuments.mx/reader036/viewer/2022081414/54b6df254a7959f4118b464f/html5/thumbnails/7.jpg)
77
Moving on…
Log everything
Build more Education communities like PokerStrategy…
Assume PokerStrategy KPIs(?)
Other Business models
Add mobile and the social web…
Our requirement: Log everything!
![Page 8: DECK36 - Log everything! and Realtime Datastream Analytics with Storm](https://reader036.vdocuments.mx/reader036/viewer/2022081414/54b6df254a7959f4118b464f/html5/thumbnails/8.jpg)
88
Logging Tools / Technologies
Producer
Web/Mobile Apps
JS Frontend
Servers
Databases
04/10/2023
Transport
Now:RabbitMQ +Erlang Consumer
OR
Kafka +Any other Consumer
Was:Flume
Storage
Now:S3 Storage +Hadoop with EMR
OR
Any other storage
Was:Virtualized Inhouse Hadoop
Analytics
MapReduce withHive/Pig
Results in any formatExcel, QlikView, RDMS, ...
Realtime Datastream Analytics
Storm / Trident
![Page 9: DECK36 - Log everything! and Realtime Datastream Analytics with Storm](https://reader036.vdocuments.mx/reader036/viewer/2022081414/54b6df254a7959f4118b464f/html5/thumbnails/9.jpg)
99
Logging Infrastructure
Producer
04/10/2023
Transport
Storage Analytics
Databases and Server
S3
Rabbit MQ
Consumer
Excel, QlikView, Tableau, SASS, ...
Graylog
Zabbix
Apps1-x
Hadoop- Cluster
RDMS
Realtime Datastream Analytics (Storm)
Nimbus(Master)
ZookeeperZookeeperZookeeper
SupervisorSupervisorSupervisor
WorkerWorker
Worker
NodeJS
![Page 10: DECK36 - Log everything! and Realtime Datastream Analytics with Storm](https://reader036.vdocuments.mx/reader036/viewer/2022081414/54b6df254a7959f4118b464f/html5/thumbnails/10.jpg)
1010
Producer
04/10/2023
PageController
Monolog-Logger
Shovel
LocalRabbitMQ
PageHitEvent
Listener
Processor
Handler
Formatter
PageHit-Event
Logger::log()
LogMessage, JSON
/Home
![Page 11: DECK36 - Log everything! and Realtime Datastream Analytics with Storm](https://reader036.vdocuments.mx/reader036/viewer/2022081414/54b6df254a7959f4118b464f/html5/thumbnails/11.jpg)
1111
Producer JS (in progress)
04/10/2023
JS Client
DataCollector(NodeJS)
Shovel
LocalRabbitMQ
Local Storage
Validator
Tracks Event
/Home
TriggerWebSocket
![Page 12: DECK36 - Log everything! and Realtime Datastream Analytics with Storm](https://reader036.vdocuments.mx/reader036/viewer/2022081414/54b6df254a7959f4118b464f/html5/thumbnails/12.jpg)
1212
Producer
04/10/2023
LoggingComponent: Provides interfaces, filters and handlers
LoggingBundle: Glues all together with Symfony2
Drupal Logging Module: Using the LoggingComponent
JS Frontend Client: LogClient for Browsers (in progress)
https://github.com/ICANS/IcansLoggingComponenthttps://github.com/ICANS/IcansLoggingBundlehttps://github.com/ICANS/drupal-logging-modulehttps://github.com/DECK36/starlog-js-frontend-client
![Page 13: DECK36 - Log everything! and Realtime Datastream Analytics with Storm](https://reader036.vdocuments.mx/reader036/viewer/2022081414/54b6df254a7959f4118b464f/html5/thumbnails/13.jpg)
1313
Transport
04/10/2023
1st Solution: Flume
+ Part of the Hadoop Ecosystem
+ Flexible Central config, Extensible via Plugins
- Not mature software (flume, flume-ng, plugin interfaces, ..)
- Central config has problems with puppet
2nd Solution: RabbitMQ
+ Local RabbitMQ Cluster
+ Decentralized config (producers & consumers simply connect)
- HDFS Sink not pre-packaged
![Page 14: DECK36 - Log everything! and Realtime Datastream Analytics with Storm](https://reader036.vdocuments.mx/reader036/viewer/2022081414/54b6df254a7959f4118b464f/html5/thumbnails/14.jpg)
1414
Storage
04/10/2023
1st Solution: Self-hosted Hadoop
- Virtualized Infrastructure makes HDFS redundant
- High costs (cluster always running, admin work)
2nd Solution: Cloud Storage
+ Amazon S3
+ Elastic MapReduce: Hadoop on demand
+ cost effective (only pay, what you use)
![Page 15: DECK36 - Log everything! and Realtime Datastream Analytics with Storm](https://reader036.vdocuments.mx/reader036/viewer/2022081414/54b6df254a7959f4118b464f/html5/thumbnails/15.jpg)
1515
Compaction
04/10/2023
RabbitMQ consumer (Erlang) stores data to cloud
Yet: we have a mixed message stream, but want:
s3://[BUCKET]/icanslog/[WEBSITE]/icans.content/year=2012/month=10/day=01/part-00000.lzo
MapReduce:
Streaming (stdin/stdout to any tool)
Computation (Hive, Pig, Cascalog, etc.)
Amazon Redshift
PostgreSQL-compatible Data Warehouse
Hive Partitioning!
![Page 16: DECK36 - Log everything! and Realtime Datastream Analytics with Storm](https://reader036.vdocuments.mx/reader036/viewer/2022081414/54b6df254a7959f4118b464f/html5/thumbnails/16.jpg)
1616
Analytics
04/10/2023
Cascalog is Clojure, Clojure is Lisp
(?<- (stdout) [?person] (age ?person ?age) … (< ?age 30))
Query Operator
CascadingOutput Tap
Columns of the dataset generated
by the query
„Generator“ „Predicate“
as many as you want
both can be any clojure function
clojure can call anything that is
available within a JVM
![Page 17: DECK36 - Log everything! and Realtime Datastream Analytics with Storm](https://reader036.vdocuments.mx/reader036/viewer/2022081414/54b6df254a7959f4118b464f/html5/thumbnails/17.jpg)
1717
Analytics
04/10/2023
• We use Cascalog to preprocess and organize that incoming flow of log messages:
![Page 18: DECK36 - Log everything! and Realtime Datastream Analytics with Storm](https://reader036.vdocuments.mx/reader036/viewer/2022081414/54b6df254a7959f4118b464f/html5/thumbnails/18.jpg)
1818
Analytics
04/10/2023
Let‘s run the Cascalog processing on Amazon EMR:
./elastic-mapreduce --create --name „Log Message Compaction"
--bootstrap-action s3://[BUCKET]/mapreduce/configure-daemons
--num-instances $NUM
--slave-instance-type m1.large
--master-instance-type m1.large
--jar s3://[BUCKET]/mapreduce/compaction/icans-cascalog.jar
--step-action TERMINATE_JOB_FLOW
--step-name "Cascalog"
--main-class icans.cascalogjobs.processing.compaction
--args "s3://[BUCKET]/incoming/*/*/*/","s3://[BUCKET]/icanslog","s3://[BUCKET]/icanslog-error
![Page 19: DECK36 - Log everything! and Realtime Datastream Analytics with Storm](https://reader036.vdocuments.mx/reader036/viewer/2022081414/54b6df254a7959f4118b464f/html5/thumbnails/19.jpg)
1919
Analytics
04/10/2023
Now we can access the log data within Hive and store results again to S3:
![Page 20: DECK36 - Log everything! and Realtime Datastream Analytics with Storm](https://reader036.vdocuments.mx/reader036/viewer/2022081414/54b6df254a7959f4118b464f/html5/thumbnails/20.jpg)
2020
Analytics
04/10/2023
Now, get the stats by executing a query:
We can now simply copy the data from S3 and import in any local analytical tool
Excel, Redshift, QlikView, R, etc.
![Page 21: DECK36 - Log everything! and Realtime Datastream Analytics with Storm](https://reader036.vdocuments.mx/reader036/viewer/2022081414/54b6df254a7959f4118b464f/html5/thumbnails/21.jpg)
2121
Realtime Datastream Analytics
04/10/2023
• Storm: Hadoop for realtime analytics
• Rock solid HA concept
• Highly scalable
• Can:Processing Streams (and trigger events)Provide a DRPC functionalityWork on enormous data load
• Fancy names for modules (spouts/bolts/tuple/topology)
• Easy to useSmall and easy to understand APIDevMode
• Add new topologies at run time
![Page 22: DECK36 - Log everything! and Realtime Datastream Analytics with Storm](https://reader036.vdocuments.mx/reader036/viewer/2022081414/54b6df254a7959f4118b464f/html5/thumbnails/22.jpg)
2222
Realtime Datastream Analytics
04/10/2023
![Page 23: DECK36 - Log everything! and Realtime Datastream Analytics with Storm](https://reader036.vdocuments.mx/reader036/viewer/2022081414/54b6df254a7959f4118b464f/html5/thumbnails/23.jpg)
2323
Happy business users!
04/10/2023
Questions they have often can be automated (ETL, Reports)
New questions can be explored (Ad-hoc, Search)
Insights can be used as feedback into the system (Decisions, Websockets)
Data-driven applications can be created that can be used by multiple websites or
they can be taylored to individual needs.
![Page 24: DECK36 - Log everything! and Realtime Datastream Analytics with Storm](https://reader036.vdocuments.mx/reader036/viewer/2022081414/54b6df254a7959f4118b464f/html5/thumbnails/24.jpg)
2424
Merci.
04/10/2023
Questions
?
![Page 25: DECK36 - Log everything! and Realtime Datastream Analytics with Storm](https://reader036.vdocuments.mx/reader036/viewer/2022081414/54b6df254a7959f4118b464f/html5/thumbnails/25.jpg)
2525
Contacts.
04/10/2023
Dr. Stefan Schadwinkel
ICANS_StScha
Mike Lohmann
mikelohmann
![Page 26: DECK36 - Log everything! and Realtime Datastream Analytics with Storm](https://reader036.vdocuments.mx/reader036/viewer/2022081414/54b6df254a7959f4118b464f/html5/thumbnails/26.jpg)
2626
Tools/Technologies
04/10/2023
![Page 27: DECK36 - Log everything! and Realtime Datastream Analytics with Storm](https://reader036.vdocuments.mx/reader036/viewer/2022081414/54b6df254a7959f4118b464f/html5/thumbnails/27.jpg)
27
DECK36 GmbH & CO. KG
Valentinskamp 18
20354 Hamburg
Germany
Phone: +49 40 22 63 82 9-0
Fax: +49 40 38 67 15 92
Web: www.deck36.de