![Page 1: Stream me up, Scotty: Experiences of integrating event-driven approaches into … · 2019-03-13 · Stream me up, Scotty: Experiences of integrating event-driven approaches into analytic](https://reader034.vdocuments.mx/reader034/viewer/2022042219/5ec56755b8e2bb797f460db4/html5/thumbnails/1.jpg)
Stream me up, Scotty: Experiencesof integrating event-driven
approaches into analytic dataplatforms
Dr. Dominik Benz, Head of Machine Learning Engineering, inovex GmbH
Confluent Streaming Workshop Cologne / Hamburg, November 2018
![Page 2: Stream me up, Scotty: Experiences of integrating event-driven approaches into … · 2019-03-13 · Stream me up, Scotty: Experiences of integrating event-driven approaches into analytic](https://reader034.vdocuments.mx/reader034/viewer/2022042219/5ec56755b8e2bb797f460db4/html5/thumbnails/2.jpg)
2
Integrateexisting (batch) data sources?
Check consistency
with datasources?
Build realtimedata
visualizations?
https://flic.kr/p/5eQA7ehttps://flic.kr/p/bpFt7U
![Page 3: Stream me up, Scotty: Experiences of integrating event-driven approaches into … · 2019-03-13 · Stream me up, Scotty: Experiences of integrating event-driven approaches into analytic](https://reader034.vdocuments.mx/reader034/viewer/2022042219/5ec56755b8e2bb797f460db4/html5/thumbnails/3.jpg)
3
Stream me up ..
Analytic(Streaming)
Data Platforms
Integrating existing(batch) data sources
Checkingconsistency
Building realtimevisualizations
Wrap up & Summary
![Page 4: Stream me up, Scotty: Experiences of integrating event-driven approaches into … · 2019-03-13 · Stream me up, Scotty: Experiences of integrating event-driven approaches into analytic](https://reader034.vdocuments.mx/reader034/viewer/2022042219/5ec56755b8e2bb797f460db4/html5/thumbnails/4.jpg)
4
A typical analytic data platform
raw processed datahub analysisingress egress
Scheduling, orchestration, metadata
user access, system integration,development
(Hive) Tables
Airflow, HiveMetastore
Batch Processing (Spark, Hive, ..)
Flat files, Databases, APIs, ...
SQL, Notebooks (Zeppelin, ..)
![Page 5: Stream me up, Scotty: Experiences of integrating event-driven approaches into … · 2019-03-13 · Stream me up, Scotty: Experiences of integrating event-driven approaches into analytic](https://reader034.vdocuments.mx/reader034/viewer/2022042219/5ec56755b8e2bb797f460db4/html5/thumbnails/5.jpg)
5
A typical (?) streaming data platform
raw processed datahub analysisingress egress
Scheduling, orchestration, metadata
user access, system integration,development
(Kafka) Topics, KTables, ..
(Confluent) Schema Registry
Stream Processing (Kafka Streams, Nifi,
..)Kafka Connect
Input Data (Streams)
KSQL
![Page 6: Stream me up, Scotty: Experiences of integrating event-driven approaches into … · 2019-03-13 · Stream me up, Scotty: Experiences of integrating event-driven approaches into analytic](https://reader034.vdocuments.mx/reader034/viewer/2022042219/5ec56755b8e2bb797f460db4/html5/thumbnails/6.jpg)
6
Stream me up ..
Analytic(Streaming)
Data Platforms
Integrating existing(batch) data sources
Checkingconsistency
Building realtimevisualizations
Wrap up & Summary
![Page 7: Stream me up, Scotty: Experiences of integrating event-driven approaches into … · 2019-03-13 · Stream me up, Scotty: Experiences of integrating event-driven approaches into analytic](https://reader034.vdocuments.mx/reader034/viewer/2022042219/5ec56755b8e2bb797f460db4/html5/thumbnails/7.jpg)
7
Integrating web tracking
companywebsite tracking
service
tracking pixel
rawtrackingdata
![Page 8: Stream me up, Scotty: Experiences of integrating event-driven approaches into … · 2019-03-13 · Stream me up, Scotty: Experiences of integrating event-driven approaches into analytic](https://reader034.vdocuments.mx/reader034/viewer/2022042219/5ec56755b8e2bb797f460db4/html5/thumbnails/8.jpg)
› Hortonworks-based platform, including Nifiand Confluent Platform
› Apache Airflow established scheduling / workflowtool, integrated into monitoring, alerting, ..
› Tracking Service: Currently batch-oriented API (request data, get download links, ..),but click event stream planned
› Developers / Analysts with mixed backgroundw.r.t. programming skills
8
Integrating web tracking: setup / constraints
![Page 9: Stream me up, Scotty: Experiences of integrating event-driven approaches into … · 2019-03-13 · Stream me up, Scotty: Experiences of integrating event-driven approaches into analytic](https://reader034.vdocuments.mx/reader034/viewer/2022042219/5ec56755b8e2bb797f460db4/html5/thumbnails/9.jpg)
› drag-and-drop visual definition of datapipelines
› various built-in connectors (file, stream, database, service, ...)
› event-based processing paradigm
› built-in queues, data provenance, backpressure handling, registry, ...
› focus: ingest & lightweight (!) transformation
› not a complex event processor (like Kafka Streams, Flink, Spark Streaming, ...)
› integrated into HDP stack
9
Apache Nifi in a Nutshell
![Page 10: Stream me up, Scotty: Experiences of integrating event-driven approaches into … · 2019-03-13 · Stream me up, Scotty: Experiences of integrating event-driven approaches into analytic](https://reader034.vdocuments.mx/reader034/viewer/2022042219/5ec56755b8e2bb797f460db4/html5/thumbnails/10.jpg)
› python library to define & schedule batchworkflows
› programmatic specification of a „DAG“ (= tasks + dependencies)
› clean handling of job run metadata (success, duration, ..)
› developed by AirBnB, open-sourced 2015
› built-in standard operators (bash, hive, spark, kubernetes, ..)
› easily extendible (custom operators, ..)
› once used -> never Oozie again J
10
Apache Airflow in a nutshell
![Page 11: Stream me up, Scotty: Experiences of integrating event-driven approaches into … · 2019-03-13 · Stream me up, Scotty: Experiences of integrating event-driven approaches into analytic](https://reader034.vdocuments.mx/reader034/viewer/2022042219/5ec56755b8e2bb797f460db4/html5/thumbnails/11.jpg)
11
Integrating web tracking: options
trackingservice
trackingdata
Option Aspects
Airflow only + integrated into monitoring, ..+ job status handling, reloading- not prepared for future streamAPI- handling file content complicated
Unified Abstraction(e.g. Apache Beam)
+ one model for batch / streamingest- comparatively high entry barrier
Nifi only + visual pipeline definition+ easy handling of file content+ event-based paradigm+ operators available- custom status handling, reloading
Kafka-Connect + fault-tolerant+ scalable setup- custom connector coding- custom status handling, reloading
![Page 12: Stream me up, Scotty: Experiences of integrating event-driven approaches into … · 2019-03-13 · Stream me up, Scotty: Experiences of integrating event-driven approaches into analytic](https://reader034.vdocuments.mx/reader034/viewer/2022042219/5ec56755b8e2bb797f460db4/html5/thumbnails/12.jpg)
› Combinesadvantagesof Airflow & Nifi
› Prepared for futurestreaming API
› Integrated intomonitoring, alerting, ..
› Status handling / reloading easy
12
Integrating web tracking: chosen solution – Airflow + Nifi
trackingservice
trigger(hourly)download
check status(sensors)
trigger, fetchdownload links
download,process, storedata
![Page 13: Stream me up, Scotty: Experiences of integrating event-driven approaches into … · 2019-03-13 · Stream me up, Scotty: Experiences of integrating event-driven approaches into analytic](https://reader034.vdocuments.mx/reader034/viewer/2022042219/5ec56755b8e2bb797f460db4/html5/thumbnails/13.jpg)
13
Stream me up ..
Analytic(Streaming)
Data Platforms
Integrating existing(batch) data sources
Checkingconsistency
Building realtimevisualizations
Wrap up & Summary
![Page 14: Stream me up, Scotty: Experiences of integrating event-driven approaches into … · 2019-03-13 · Stream me up, Scotty: Experiences of integrating event-driven approaches into analytic](https://reader034.vdocuments.mx/reader034/viewer/2022042219/5ec56755b8e2bb797f460db4/html5/thumbnails/14.jpg)
14
Checking consistency: Customer Consent
customerportal
grants / revokesconsent
writesconsentto hive
kafka
consentevent
in sync?
https://flic.kr/p/9yHuk8
Customer(consent)database
storesconsent
![Page 15: Stream me up, Scotty: Experiences of integrating event-driven approaches into … · 2019-03-13 · Stream me up, Scotty: Experiences of integrating event-driven approaches into analytic](https://reader034.vdocuments.mx/reader034/viewer/2022042219/5ec56755b8e2bb797f460db4/html5/thumbnails/15.jpg)
› Analysts need up-to-date version of customerconsent information in platform
› Hard correctness requirements (especiallyregarding revoked consent)
› Continuous monitoring of correctness
› Alerting in case of differences
15
Checking consistency: setup / constraints
![Page 16: Stream me up, Scotty: Experiences of integrating event-driven approaches into … · 2019-03-13 · Stream me up, Scotty: Experiences of integrating event-driven approaches into analytic](https://reader034.vdocuments.mx/reader034/viewer/2022042219/5ec56755b8e2bb797f460db4/html5/thumbnails/16.jpg)
16
Checking Consistency: Statistics Events
customerportal
kafka
› use existing channel (kafka)
› source inject periodic „statistics events“ into stream with defined measure point(in time)
{type:GRANT, cid:12, ts:2018-10-01 11:00:00 ..}
{type:GRANT, cid:10, ts:2018-10-01 11:01:00 ..}
{type:REVOK, cid:09, ts:2018-10-01 11:01:05 ..}
{type=STAT, measure_ts=2018-10-01 11:01:20,stats={num_consent_v1:72625,
num_consent_v2: 6252, ..}}
time
![Page 17: Stream me up, Scotty: Experiences of integrating event-driven approaches into … · 2019-03-13 · Stream me up, Scotty: Experiences of integrating event-driven approaches into analytic](https://reader034.vdocuments.mx/reader034/viewer/2022042219/5ec56755b8e2bb797f460db4/html5/thumbnails/17.jpg)
17
Checking Consistency: Evaluate Statistics Event
› perform count on target side (Hive) upto$measurePoint
› compare counts
› counts = simple plausibility check, but more elaboratedchecks (hashes) thinkable
{type=STAT, measure_ts=2018-10-01 11:01:20,stats={num_consent_v1:72625,
num_consent_v2: 6252, ..}}
in sync?
{ measure_ts=2018-10-01 11:01:20,hive_stats={
num_consent_v1:72625, num_consent_v2: 6252, ..}
}
Customer
(consent)
database
![Page 18: Stream me up, Scotty: Experiences of integrating event-driven approaches into … · 2019-03-13 · Stream me up, Scotty: Experiences of integrating event-driven approaches into analytic](https://reader034.vdocuments.mx/reader034/viewer/2022042219/5ec56755b8e2bb797f460db4/html5/thumbnails/18.jpg)
18
Stream me up ..
Analytic(Streaming)
Data Platforms
Integrating existing(batch) data sources
Checkingconsistency
Building realtimevisualizations
Wrap up & Summary
![Page 19: Stream me up, Scotty: Experiences of integrating event-driven approaches into … · 2019-03-13 · Stream me up, Scotty: Experiences of integrating event-driven approaches into analytic](https://reader034.vdocuments.mx/reader034/viewer/2022042219/5ec56755b8e2bb797f460db4/html5/thumbnails/19.jpg)
19
Realtime visualizations: Online Shop Purchases
onlineshop
JMS
purchaseevent
normalization,filtering,
aggregation, ..
https://flic.kr/p/9yHuk8
realtimedashboard
![Page 20: Stream me up, Scotty: Experiences of integrating event-driven approaches into … · 2019-03-13 · Stream me up, Scotty: Experiences of integrating event-driven approaches into analytic](https://reader034.vdocuments.mx/reader034/viewer/2022042219/5ec56755b8e2bb797f460db4/html5/thumbnails/20.jpg)
› Goal: timely insights into various purchaseaspects (items bought last 5min, ..)
› flexible / configurable frontend (time window,aggregation dimension, ..)
› scalable to 100s / 1000s of dashboard users
› low latency of dashboard backend
20
Realtime visualizations: setup / constraints
![Page 21: Stream me up, Scotty: Experiences of integrating event-driven approaches into … · 2019-03-13 · Stream me up, Scotty: Experiences of integrating event-driven approaches into analytic](https://reader034.vdocuments.mx/reader034/viewer/2022042219/5ec56755b8e2bb797f460db4/html5/thumbnails/21.jpg)
21
Realtime visualizations: components / options
JMS
transport layer
service backend
service API
processing
Kafka-connect
KafkaKafka-streams
Kafka-connect
HBase
Phoenix / JDBC
Spring Boot
Nifi
Kafka
Tranquility
Druid
Spring Boot
aggregation duringprocessing
aggregation at query-time
Built-in, configurableaggregation
Nifi
Kafka
Kafka-connect
HBase
Phoenix / JDBC
Spring Boot
![Page 22: Stream me up, Scotty: Experiences of integrating event-driven approaches into … · 2019-03-13 · Stream me up, Scotty: Experiences of integrating event-driven approaches into analytic](https://reader034.vdocuments.mx/reader034/viewer/2022042219/5ec56755b8e2bb797f460db4/html5/thumbnails/22.jpg)
22
Realtime visualizations: chosen solution
JMS
Nifi
Kafka
Tranquility
Druid
Spring Boot
› Druid: time series database with focus on
› Realtime ingestion, good Kafka integation
› „slice-and-dice“ queries
› distributed scale-out architecture
› Event processing kept simple in Nifi› mainly cleaning, transformation
› aggregation is pushed down to Druid
› But: yet another distributed system .. L› Experiences good so far, but needs work / skills
![Page 23: Stream me up, Scotty: Experiences of integrating event-driven approaches into … · 2019-03-13 · Stream me up, Scotty: Experiences of integrating event-driven approaches into analytic](https://reader034.vdocuments.mx/reader034/viewer/2022042219/5ec56755b8e2bb797f460db4/html5/thumbnails/23.jpg)
23
Stream me up ..
Analytic(Streaming)
Data Platforms
Integrating existing(batch) data sources
Checkingconsistency
Building realtimevisualizations
Wrap up & Summary
![Page 24: Stream me up, Scotty: Experiences of integrating event-driven approaches into … · 2019-03-13 · Stream me up, Scotty: Experiences of integrating event-driven approaches into analytic](https://reader034.vdocuments.mx/reader034/viewer/2022042219/5ec56755b8e2bb797f460db4/html5/thumbnails/24.jpg)
› Technology moves from batch to stream – whatabout people?
› Analysts‘ world = often batch world› tooling centered around static datasets› can (and must) be generated from streams› but: education towards stream / event-based
thinking necessary!
› Incremental / stream-based data exchange = paradigm shift› efforts / commitment „from both ends“ necessary
24
The human factor ..
https://flic.kr/p/f2Wx6t
![Page 25: Stream me up, Scotty: Experiences of integrating event-driven approaches into … · 2019-03-13 · Stream me up, Scotty: Experiences of integrating event-driven approaches into analytic](https://reader034.vdocuments.mx/reader034/viewer/2022042219/5ec56755b8e2bb797f460db4/html5/thumbnails/25.jpg)
25
Stream me up, Scotty ..
The future is event-based, but on the way:
› Existing batch-oriented APIs› use (scheduled) event-based tools for easier later migration
› Checking consistency› inject plausibility checks into data stream
› Realtime visualizations› Druid + Kafka powerful and flexible combination
› Don‘t forget the human in the loop!
![Page 26: Stream me up, Scotty: Experiences of integrating event-driven approaches into … · 2019-03-13 · Stream me up, Scotty: Experiences of integrating event-driven approaches into analytic](https://reader034.vdocuments.mx/reader034/viewer/2022042219/5ec56755b8e2bb797f460db4/html5/thumbnails/26.jpg)
Vielen Dank
Dr. Dominik Benz
inovex GmbH
Park Plaza
Ludwig-Erhard-Allee 6
76131 Karlsruhe