the avant-garde of apache nifi

45
The Avant-garde of Apache NiFi Joe Percivall - @JPercivall Hadoop Summit – Melbourne 31 August 2016

Upload: dataworks-summithadoop-summit

Post on 16-Apr-2017

588 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: The Avant-garde of Apache NiFi

The Avant-garde of Apache NiFiJoe Percivall - @JPercivallHadoop Summit – Melbourne

31 August 2016

Page 2: The Avant-garde of Apache NiFi

2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

About Me• Software Engineer at Hortonworks

• Apache NiFi committer and PMC member

• Github: github.com/JPercivall

Page 3: The Avant-garde of Apache NiFi

3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Agenda• Intro to NiFi

• What’s new in NiFi 1.0.0

• Intro to MiNiFi

• MiNiFi Architecture

• NiFi & MiNiFi Demo

Page 4: The Avant-garde of Apache NiFi

4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Agenda• Intro to Apache NiFi

• What’s new in NiFi 1.0.0

• Intro to MiNiFi

• MiNiFi Architecture

• NiFi & MiNiFi Demo

Page 5: The Avant-garde of Apache NiFi

5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Let’s Connect A to BProducers A.K.A Things

AnythingAND

Everything

Internet!

Consumers• User• Storage• System• …More Things

Page 6: The Avant-garde of Apache NiFi

6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Why is moving data effectively hard?

Standards Formats “Exactly Once” Delivery Protocols Veracity of Information Validity of Information Ensuring Security Overcoming Security

Compliance Schemas Consumers Change Credential Management “That [person|team|group]” Network “Exactly Once” Delivery

Page 7: The Avant-garde of Apache NiFi

7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

• Web-based User Interface for creating, monitoring, & controlling data flows

• Directed graphs of data routing and transformation

• Highly configurable - modify data flow at runtime, dynamically prioritize data

• Easily extensible through development of custom components

• Data Provenance tracks data through entire system

[1] https://nifi.apache.org/

Dataflow

Apache NiFi

Page 8: The Avant-garde of Apache NiFi

8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Apache NiFiKey Features

• Guaranteed delivery• Data buffering

- Backpressure- Pressure release

• Prioritized queuing• Flow specific QoS

- Latency vs. throughput- Loss tolerance

• Data provenance• Supports push and pull

models

• Recovery/recording a rolling log of fine-grained history

• Visual command and control

• Flow templates• Pluggable/multi-role

security• Designed for extension• Clustering

Page 9: The Avant-garde of Apache NiFi

9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Simplified ExampleLet’s consider the needs of a courier service

Physical Store

Gateway Server

Mobile Devices

Registers

Server Cluster

Distribution Center Core Data Center at HQ

Server Cluster

On Delivery Routes

Trucks Deliverers

Delivery Truck: Creative Stall, https://thenounproject.com/creativestall/Deliverer: Rigo Peter, https://thenounproject.com/rigo/Cash Register: Sergey Patutin, https://thenounproject.com/bdesign.by/Hand Scanner: Eric Pearson, https://thenounproject.com/epearson001/

Page 10: The Avant-garde of Apache NiFi

10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Great! I am collecting all this data! Let’s use it!Finding our needles in the haystack

Physical Store

Gateway Server

Mobile Devices

Registers

Server Cluster

Distribution Center

Kafka

Core Data Center at HQ

Server Cluster

Others

Storm / Spark / Flink / Apex

Kafka

Storm / Spark / Flink / Apex

Trucks Deliverers

Delivery Truck: Creative Stall, https://thenounproject.com/creativestall/Deliverer: Rigo Peter, https://thenounproject.com/rigo/Cash Register: Sergey Patutin, https://thenounproject.com/bdesign.by/Hand Scanner: Eric Pearson, https://thenounproject.com/epearson001/

On Delivery Routes

Page 11: The Avant-garde of Apache NiFi

11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Let’s revisit our courier service from the perspective of NiFi

Physical Store

Gateway Server

Mobile Devices

Registers

Server Cluster

Distribution Center

Kafka

Core Data Center at HQ

Server Cluster

Others

Storm / Spark / Flink / Apex

Kafka

Storm / Spark / Flink / Apex

Trucks Deliverers

Delivery Truck: Creative Stall, https://thenounproject.com/creativestall/Deliverer: Rigo Peter, https://thenounproject.com/rigo/Cash Register: Sergey Patutin, https://thenounproject.com/bdesign.by/Hand Scanner: Eric Pearson, https://thenounproject.com/epearson001/

NiFi NiFi NiFi NiFi NiFi NiFi

On Delivery Routes

Page 12: The Avant-garde of Apache NiFi

12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Fundamental Terminology

FlowFile• Unit of data moving through the system• Content + Attributes (key/value pairs)

Processor• Performs the work, can access FlowFiles

Connection• Links between processors• Queues that can be dynamically prioritized

git clone https://github.com/JPercivall/nifi-developer-tutorial.git

Page 13: The Avant-garde of Apache NiFi

13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Agenda• Intro to NiFi

• What’s new in NiFi 1.0.0

• Intro to MiNiFi

• MiNiFi Architecture

• NiFi & MiNiFi Demo

Page 14: The Avant-garde of Apache NiFi

14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Apache NiFi-1.0.0

Zero Master Clustering UI Refresh Multi-tenant authorization and internal

authorization/policy management

15+ new components

Over 450 tickets closed!

Page 15: The Avant-garde of Apache NiFi

15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Zero Master Clustering

Page 16: The Avant-garde of Apache NiFi

16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Zero Master Clustering

Page 17: The Avant-garde of Apache NiFi

17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

UI Refresh & Multi-tenant Authorization

Page 18: The Avant-garde of Apache NiFi

18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Agenda• Intro to NiFi

• What’s new in NiFi 1.0.0

• Intro to MiNiFi

• MiNiFi Architecture

• NiFi & MiNiFi Demo

Page 19: The Avant-garde of Apache NiFi

19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Revisit: Courier service from the perspective of NiFi

Physical Store

Gateway Server

Mobile Devices

Registers

Server Cluster

Distribution Center Core Data Center at HQ

Server Cluster

Trucks Deliverers

Delivery Truck: Creative Stall, https://thenounproject.com/creativestall/Deliverer: Rigo Peter, https://thenounproject.com/rigo/Cash Register: Sergey Patutin, https://thenounproject.com/bdesign.by/Hand Scanner: Eric Pearson, https://thenounproject.com/epearson001/

NiFi NiFi NiFi NiFi NiFi NiFi

On Delivery Routes

Page 20: The Avant-garde of Apache NiFi

20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Courier service from the perspective of NiFi & MiNiFi

Physical Store

Gateway Server

Mobile Devices

Registers

Server Cluster

Distribution Center Core Data Center at HQ

Server Cluster

Trucks Deliverers

Delivery Truck: Creative Stall, https://thenounproject.com/creativestall/Deliverer: Rigo Peter, https://thenounproject.com/rigo/Cash Register: Sergey Patutin, https://thenounproject.com/bdesign.by/Hand Scanner: Eric Pearson, https://thenounproject.com/epearson001/

Client Libraries

Client Libraries

MiNiFi

MiNiFi NiFi NiFi NiFi NiFi NiFi NiFi

Client Libraries

On Delivery Routes

Page 21: The Avant-garde of Apache NiFi

21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Apache NiFi MiNiFiKey Features

• Guaranteed delivery• Data buffering

- Backpressure- Pressure release

• Prioritized queuing• Flow specific QoS

- Latency vs. throughput- Loss tolerance

• Data provenance

• Recovery/recording a rolling log of fine-grained history

• Designed for extension

• Design and Deploy• Warm re-deploys

Page 22: The Avant-garde of Apache NiFi

22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Apache NiFi MiNiFiKey Features

• Guaranteed delivery• Data buffering

- Backpressure- Pressure release

• Prioritized queuing• Flow specific QoS

- Latency vs. throughput- Loss tolerance

• Data provenance

• Recovery/recording a rolling log of fine-grained history

• Designed for extension

• Design and Deploy• Warm re-deploys

Page 23: The Avant-garde of Apache NiFi

23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Visual Command and Controlvs.

Design and Deploy

Page 24: The Avant-garde of Apache NiFi

24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Created to more effectively collect data at the edge

Page 25: The Avant-garde of Apache NiFi

25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Agenda• Intro to NiFi

• What’s new in NiFi 1.0.0

• Intro to MiNiFi

• MiNiFi Architecture

• NiFi & MiNiFi Demo

Page 26: The Avant-garde of Apache NiFi

26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

NiFi vs MiNiFi Java Processes

NiFi Framework

Components

MiNiFi

NiFi Framework

User Interface

Components

NiFi

Page 27: The Avant-garde of Apache NiFi

27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

NiFi Java Processes

Bootstrap

NiFi

UI

bootstrap.conf

nifi.properties

flow.xml.gzreads &modifies

reads

reads

starts

NiFi MiNiFi

Page 28: The Avant-garde of Apache NiFi

28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

MiNiFi Java Processes

MiNiFi

Bootstrap

ConfigurationChange Notifier(s)

bootstrap.conf

nifi.properties

flow.xml.gzreads

reads

starts

config.ymltransforms

reads

into

NiFi MiNiFi

Page 29: The Avant-garde of Apache NiFi

29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Same Extensible Framework (nars)

In minifi-0.0.1, the nifi-0.6.1 standard processors are bundled (~20mb)– Tailing a Log– UpdateAttribute– Routing by content or attributes– PutEmail

Allows MiNiFi to use NiFi processors

Page 30: The Avant-garde of Apache NiFi

30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Simple Config.ymlTail a rolling file -> Site to Site

Page 31: The Avant-garde of Apache NiFi

31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Agenda• Intro to NiFi

• What’s new in NiFi 1.0.0

• Intro to MiNiFi

• MiNiFi Architecture

• NiFi & MiNiFi Demo

Page 32: The Avant-garde of Apache NiFi

32 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Courier service from the perspective of NiFi & MiNiFi

Physical Store

Gateway Server

Mobile Devices

Registers

Server Cluster

Distribution Center Core Data Center at HQ

Server Cluster

Trucks Deliverers

Delivery Truck: Creative Stall, https://thenounproject.com/creativestall/Deliverer: Rigo Peter, https://thenounproject.com/rigo/Cash Register: Sergey Patutin, https://thenounproject.com/bdesign.by/Hand Scanner: Eric Pearson, https://thenounproject.com/epearson001/

Client Libraries

Client Libraries

MiNiFi

MiNiFi NiFi NiFi NiFi NiFi NiFi NiFi

Client Libraries

On Delivery Routes

Page 33: The Avant-garde of Apache NiFi

33 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Questions?

Page 34: The Avant-garde of Apache NiFi

34 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Thank you!

Page 35: The Avant-garde of Apache NiFi

35 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Learn more and join us!

Apache NiFi sitehttp://nifi.apache.org

Subproject MiNiFi sitehttp://nifi.apache.org/minifi/

Subscribe to and collaborate [email protected]@nifi.apache.org

Submit Ideas or Issueshttps://issues.apache.org/jira/browse/NIFI

Follow us on Twitter@apachenifi

Page 36: The Avant-garde of Apache NiFi

36 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Back-up

Page 37: The Avant-garde of Apache NiFi

37 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Matured at NSA 2006-2014

Brief history of the Apache NiFi Community

• Contributors from Government and several commercial industries

• Releases on a 6-8 week schedule

• Apache NiFi 1.0.0. release on the horizon• Zero-Master Clustering

Code developed at NSA

2006

Today

Achieved TLP

status in just 7 months

July 2015

Code available open source

ASL v2

November 2014

Page 38: The Avant-garde of Apache NiFi

38 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

A bit more complex Config.ymlTail a rolling File -> Secure Site to Site with Provenance

Page 39: The Avant-garde of Apache NiFi

39 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

MiNiFi 0.0.1-Java

Declarative configuration of processing flows through a YAML configuration file Exporting of provenance events to another NiFi instance via a Reporting Task over Site to

Site Flow change configuration watcher implementations that provide reloading a NiFi

instance when receiving an updated flow over REST or changes on a file system Providing a mechanism to query an instance's status

<40mb binary distribution

Release Notes

Page 40: The Avant-garde of Apache NiFi

40 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Change notifier update

MiNiFi

Bootstrap

ConfigurationChange Notifiers

1. Initial state– Both running

Page 41: The Avant-garde of Apache NiFi

41 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Change notifier update

MiNiFi

Bootstrap

ConfigurationChange Notifiers

user creates new configuration2. User sends update through

notifier– HTTP(S) post request– Change watched file

Page 42: The Avant-garde of Apache NiFi

42 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Change notifier update

MiNiFi

Bootstrap

ConfigurationChange Notifiers

3. Bootstrap validation– Basic validation– Rest notifier will respond

accordingly– Results logged

validate new configuration

user creates new configuration

Page 43: The Avant-garde of Apache NiFi

43 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Change notifier update

MiNiFi

Bootstrap

ConfigurationChange Notifiers

config.ymlsaves new

4. Bootstrap saves and transforms

– Copy old config.yml to a swap file

validate new configuration

user creates new configuration

nifi.properties

flow.xml.gz

transforms into

Page 44: The Avant-garde of Apache NiFi

44 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Change notifier update

MiNiFi

Bootstrap

ConfigurationChange Notifiers

nifi.properties

flow.xml.gz

attempt restart

config.ymlsaves new

reads

transforms into

5. Bootstrap attempts restart– MiNiFi reads in the new

nifi.properties and flow.xml.gz

validate new configuration

user creates new configuration

Page 45: The Avant-garde of Apache NiFi

45 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Change notifier update

6. Success or Fail– Successful restart continue

processing– Failure, rollback to old

config– Existing Data is mapped or

orphaned

MiNiFi

Bootstrap

ConfigurationChange Notifiers

nifi.properties

flow.xml.gz

attempt restart

config.ymlsaves new

reads

transforms into

validate new configuration

user creates new configuration