introduction to apache nifi and storm

Post on 13-Apr-2017

197 Views

Category:

Software

10 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Introduction to Apache NiFi & Storm

Jungtaek Lim

WHO AM I?• Staff Software Engineer @ Hortonworks

• remote worker

• Open source prosumer

• Committer of Jedis

• PMC member of Apache Storm

• Contributor of Apache (Spark, Zeppelin, Ambari, Calcite), Redis, and so on.

• Contact

• kabhwan@gmail.com

• Twitter / LinkedIn / Github / Facebook

• @heartsavior

CoreInfrastructureSources

à ConstrainedÃHigh-latencyà Localizedcontext

ÃHybrid– cloud/on-premisesà Low-latencyÃGlobalcontext

RegionalInfrastructure

DATA IN MOTION IN HORTONWORKS DATAFLOW (HDF)

Source: http://ko.hortonworks.com/products/data-center/hdf/

What is Apache NiFi?

An easy to use, powerful, and reliable system to process and distribute data.

History of Apache NiFi

• Created by the United States National Security Agency (NSA)

• originally named Niagarafiles

• In 2014 the NSA submitted the source code to Apache Software Foundation, via the NSA Technology Transfer Program, entered incubation in December 2014

• Development of Apache NiFi continued at Onyara, Inc., a start up company

• Became Apache Top-Level Project in July 2015

• Hortonworks acquired Onyara, Inc. in August 2015

Role of Apache NiFi

• Data acquisition and delivery

• Simple transformation and data routing

• Simple event processing

• End to end provenance

• Edge intelligence and bi-directional comms.

NOT intended to REPLACE ‘distribute computation engines’

(a.k.a streaming processing frameworks)

Features of Apache NiFi

Highly configurable

• Loss tolerant vs guaranteed delivery

• Low latency vs high throughput

• Dynamic prioritization

• Flow can be modified at runtime

• Back pressure

More…• Designed for extension

• Build your own processors and more

• Secure

• SSL, SSH, HTTPS, encrypted content, etc...

• Multi-tenant authorization and internal authorization/policy management

• MiNiFi subproject

• Reduce footprint to ~ 40 MB

What is Apache Storm?

A free and open source distributed realtime computation system.

History of Apache Storm

Source: http://hortonworks.com/blog/brief-history-apache-storm/

Concepts of Apache Storm

• Spout: a source of streams in a topology

• Bolt: a processing component which includes Sink

• Stream: an unbounded sequence of tuples, defined with schema

• Stream groupings: defines how that stream should be partitioned among the bolt's tasks

• Topology: the logic for a realtime application represented to a DAG

Core vs Trident

Core Trident

Computation Unit Record (tuple) Micro batch

Latency Very low (sub-seconds) High (up to batch size)Similar to Spark Streaming

Delivery Guarantee At least once Exactly once

API Compositional Declarative

Stateful Operator Supported from v1.0.0 Core feature(exactly-once)

Windowing Time (processing time, event time), CountTumbling window, Sliding window

Features of Apache Storm

• Supports number of connectors (17 connectors in master branch)

• Automatic back-pressure

• Distributed Cache

• Flux (constructing topology via yaml)

• Distributed Log Search

• Dynamic Worker Profiling

• Dynamic Log Levels

• Topology Event Inspector

• Resource Aware Scheduler

• SQL (Experimental)

Future of Apache StormApache Storm 2.0 and beyond

• Clojure to Java translation

• Unified Stream API with supporting exactly-once

• Rework Metrics feature

• Apache Beam runner

• Streaming SQL with Apache Calcite

• And more…

• Performance

• Usability

THANKS!Any questions?

Appendix A. Apache NiFi

NiFi EvaluateJsonPath / RouteOnAttribute configuration

NiFi PutHDFS / PublishKafka configuration

NiFi Queue options – Status History

NiFi Queue options – List queue

NiFi Data Provenance

Appendix B. Apache Storm

Distributed Log Search

Dynamic Worker Profiling

Dynamic Log Levels

Topology Event Inspector

Resource Aware SchedulerSource:ResourceAwareSchedulinginApacheStorm,HadoopSummitSanJose2016

top related