onyx data processing the clojure way

53
Onyx - Data Processing The Clojure Way By Bahadir Cambel @bahadircambel

Upload: bahadir-cambel

Post on 25-Jan-2017

671 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Onyx   data processing  the clojure way

Onyx - Data Processing

The Clojure WayBy Bahadir Cambel@bahadircambel

Page 2: Onyx   data processing  the clojure way

Raise your hands if you have used

- Cascalog

- Hadoop

- Spark

- Flink

- Samza

- Storm

- Sqoop

Page 3: Onyx   data processing  the clojure way

What’s good for ?

Realtime event stream processing

Continuous computation

Extract, transform, load (ETL)

Data transformation à la map-reduce

Data ingestion and storage medium transfer

Data cleaning

Page 4: Onyx   data processing  the clojure way

Data Structures + Simple Functions

- Sounds familiar ?

Page 5: Onyx   data processing  the clojure way

Hadoop

Page 6: Onyx   data processing  the clojure way
Page 8: Onyx   data processing  the clojure way
Page 10: Onyx   data processing  the clojure way
Page 11: Onyx   data processing  the clojure way

Spark

http://spark.apache.org/examples.html SCALA

JAVA dev having a productive day

Page 12: Onyx   data processing  the clojure way
Page 13: Onyx   data processing  the clojure way

Cascalog

http://cascalog.org/articles/getting_started.html

?-

<-

:>

Page 14: Onyx   data processing  the clojure way
Page 15: Onyx   data processing  the clojure way

Internet Scale MongoDB could save youu?!

Page 16: Onyx   data processing  the clojure way
Page 17: Onyx   data processing  the clojure way

Onyx

Page 18: Onyx   data processing  the clojure way

Workflow

- articulate the paths that data flows through the cluster at runtime

- DAG

Page 19: Onyx   data processing  the clojure way

Catalog

- Describe and configure workflow items

Page 20: Onyx   data processing  the clojure way

Function(s)

Page 21: Onyx   data processing  the clojure way

Flow Conditions

From -> To ( if predicate correct)

Flow conditions are used for isolating logic about whether or not segments should pass through different tasks in a workflow, exception handling and support a rich degree of composition with runtime parameterization.

Page 22: Onyx   data processing  the clojure way

Windows / Triggers

partitions a possible unbounded sequence of data into finite pieces, allowing aggregations to be specified

- Timer

- Segment

- Punctuation

- Watermark

- Percentile-Watermark

Page 24: Onyx   data processing  the clojure way

Job

A job will be translated into multiple tasks. Peers will take care of these tasks.

If your number of tasks > available peers

A job won’t be complete ( Buy me a beer or 10)

Page 25: Onyx   data processing  the clojure way

Bulk functions

perform a fn more efficiently over a batch of segments rather than processing one segment at a time.

- Write to DB Onyx will ignore the output of your function and pass the same segments that you received downstream

Page 26: Onyx   data processing  the clojure way

Group by

“like” values are always routed to the same virtual peer

- Group by key

- Group by a fn

Specify in the catalog!

Page 27: Onyx   data processing  the clojure way

Fixed Windows

a data point will fall into exactly one instance of a window (often called an extent in the literature)Between t1=0 and t2=4 how many events have happened?

t1=5 t2=9, t1=10 t2=14

And so on..

Page 28: Onyx   data processing  the clojure way

Sliding Window

a slide value for how long to wait between spawning a new window extentBetween t1=0 and t2=14 how many events have happened?

t1=5 t2=19 ?

t1=10 t2=24 ?

Page 29: Onyx   data processing  the clojure way

Global Window

Page 30: Onyx   data processing  the clojure way

Session Window

dynamically resize their upper and lower bounds in reaction to incoming dataSessions capture a time span of activity for a specific key, such as a user ID. If no activity occurs within a timeout gap, the session closes. If an event occurs within the bounds of a session, the window size is fused with the new event, and the session is extended by its timeout gap either in the forward or backward direction

Page 31: Onyx   data processing  the clojure way

Aggregation:onyx.windowing.aggregation/conj

:onyx.windowing.aggregation/count

:onyx.windowing.aggregation/sum

:onyx.windowing.aggregation/min

:onyx.windowing.aggregation/max

:onyx.windowing.aggregation/average

Page 32: Onyx   data processing  the clojure way

Architecture

Page 33: Onyx   data processing  the clojure way

Peer

is a node in the cluster responsible for processing data

Page 34: Onyx   data processing  the clojure way

Virtual Peer

A Virtual Peer refers to a single peer process running on a single physical machine. A single Virtual Peer executes at most one task at a time.

Page 35: Onyx   data processing  the clojure way

ZooKeeper

Watches Peers

Page 36: Onyx   data processing  the clojure way

Aeron

Efficient reliable UDP unicast, UDP multicast, and IPC message transport

Messaging layer takes care of the direct peer to peer transfer of segment batches, acks, segment completion and segment retries to the relevant virtual peers.

Page 37: Onyx   data processing  the clojure way

The Log

Page 38: Onyx   data processing  the clojure way

Scheduling

If there is no master, how does scheduling work ?

Peers contend to work on tasks.

Page 39: Onyx   data processing  the clojure way

Types of Job Schedulers

- Greedy ( I need ALL!!!! Gimme all!!)

- Balanced Robin ( Fair play)

- Percentage ( Not so fair play)

Page 40: Onyx   data processing  the clojure way

Types of Task Schedulers

- Balanced

- Percentage

- Colocation (assigns them to the peers on a single physical machine, low latency, min network)

Page 41: Onyx   data processing  the clojure way

Tags

a set of machines in your cluster are privileged

Run some tasks at some specific machines

Declare a peer with capabilities

- Datomic

- Special Hardware (GPU, Memory)

- Network

Page 43: Onyx   data processing  the clojure way

Example - Let’s process some logs49556677821280438558577372995495836672945903576549425154

Page 44: Onyx   data processing  the clojure way

Check out the repository

https://github.com/bcambel/onyx-test

Page 45: Onyx   data processing  the clojure way

End users configuring what

- workflows should look like.

- Language agnostic

- Location agnostic

- Tolerant to machine generation

- Temporally agnostic ( should wait for a time to be realized)

Page 46: Onyx   data processing  the clojure way

If you are not enjoying your experience

There is something fundamentally wrong with the tool

Think about Apple’s smooth product experience.

Pain detected, thought through. (Pain -> Pleasure)

Page 47: Onyx   data processing  the clojure way

Tools

- Onyx ETL https://github.com/onyx-platform/onyx-etl

- Dashboard https://github.com/onyx-platform/onyx-dashboard

- Replica Cons https://github.com/onyx-platform/onyx-console-dashboard

- Ansible Playbook https://github.com/onyx-platform/ansible-onyx

- Metrics Suite https://github.com/onyx-platform/onyx-metrics

- Benchmark Suite https://github.com/onyx-platform/onyx-benchmark

- Jepsen https://github.com/onyx-platform/onyx-jepsen

Page 48: Onyx   data processing  the clojure way

Onyx Dashboard

https://github.com/onyx-platform/onyx-dashboard

Page 49: Onyx   data processing  the clojure way

Questions

- How does Onyx distributes reads (input tasks) ? Parallelization?? - evenly break up a database table into chunks which can be read by multiple

peers

- Segments realized. Tasks created. Peers get into action

Page 50: Onyx   data processing  the clojure way

Helpful Links

http://www.onyxplatform.org/

https://gitter.im/onyx-platform/onyx

https://clojurians.slack.com/messages/onyx

https://github.com/onyx-platform/onyx-examples

https://github.com/onyx-platform/learn-onyx

https://github.com/Yuppiechef/cqrs-server (Open source project using Onyx)

Page 51: Onyx   data processing  the clojure way

Shameless self plug

http://www.bahadir.io/

https://twitter.com/bahadircambel

https://www.strava.com/athletes/8258974

Page 52: Onyx   data processing  the clojure way

Thanks

Michael Drogalis - https://twitter.com/michaeldrogalis

Lucas Bradstreet - https://twitter.com/ghaz

Page 53: Onyx   data processing  the clojure way

Thanks for using Clojure!