apache nifi crash course intro

47
Apache NiFi Crash Course Intro Joe Percivall - @JPercivall Hadoop Summit – Melbourne 1 Sept 2016

Upload: hadoop-summit

Post on 06-Jan-2017

1.172 views

Category:

Technology


7 download

TRANSCRIPT

Page 1: Apache NiFi Crash Course Intro

Apache NiFi Crash Course IntroJoe Percivall - @JPercivallHadoop Summit – Melbourne

1 Sept 2016

Page 2: Apache NiFi Crash Course Intro

2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

About Me• Software Engineer at Hortonworks

• Apache NiFi committer and PMC member

• Github: github.com/JPercivall

Page 3: Apache NiFi Crash Course Intro

3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

AgendaWhat is dataflow and what are the challenges?

Apache NiFi

Architecture

Live Demo

Community

Page 4: Apache NiFi Crash Course Intro

4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

AgendaWhat is dataflow and what are the challenges?

Apache NiFi

Architecture

Live Demo

Community

Page 5: Apache NiFi Crash Course Intro

5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Let’s Connect A to BProducers A.K.A Things

AnythingAND

Everything

Internet!

Consumers• User• Storage• System• …More Things

Page 6: Apache NiFi Crash Course Intro

6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Moving data effectively is hard

Standards: http://xkcd.com/927/

Page 7: Apache NiFi Crash Course Intro

7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Why is moving data effectively hard?

Standards Formats “Exactly Once” Delivery Protocols Veracity of Information Validity of Information Ensuring Security Overcoming Security

Compliance Schemas Consumers Change Credential Management “That [person|team|group]” Network “Exactly Once” Delivery

Page 8: Apache NiFi Crash Course Intro

8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Let’s Connect Lots of As to Bs to As to Cs to Bs to Δs to Cs to ϕsLet’s consider the needs of a courier service

Physical Store

Gateway Server

Mobile Devices

Registers

Server Cluster

Distribution Center Core Data Center at HQ

Server Cluster

On Delivery Routes

Trucks Deliverers

Delivery Truck: Creative Stall, https://thenounproject.com/creativestall/Deliverer: Rigo Peter, https://thenounproject.com/rigo/Cash Register: Sergey Patutin, https://thenounproject.com/bdesign.by/Hand Scanner: Eric Pearson, https://thenounproject.com/epearson001/

Page 9: Apache NiFi Crash Course Intro

9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Great! I am collecting all this data! Let’s use it!Finding our needles in the haystack

Physical Store

Gateway Server

Mobile Devices

Registers

Server Cluster

Distribution Center

Kafka

Core Data Center at HQ

Server Cluster

Others

Storm / Spark / Flink / Apex

Kafka

Storm / Spark / Flink / Apex

On Delivery Routes

Trucks Deliverers

Delivery Truck: Creative Stall, https://thenounproject.com/creativestall/Deliverer: Rigo Peter, https://thenounproject.com/rigo/Cash Register: Sergey Patutin, https://thenounproject.com/bdesign.by/Hand Scanner: Eric Pearson, https://thenounproject.com/epearson001/

Page 10: Apache NiFi Crash Course Intro

10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Let’s Connect Lots of As to Bs to As to Cs to Bs to Δs to Cs to ϕsOh, that courier service is global

Page 11: Apache NiFi Crash Course Intro

11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

AgendaWhat is dataflow and what are the challenges?

Apache NiFi

Architecture

Live Demo

Community

Page 12: Apache NiFi Crash Course Intro

12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

• Web-based User Interface for creating, monitoring, & controlling data flows

• Directed graphs of data routing and transformation

• Highly configurable - modify data flow at runtime, dynamically prioritize data

• Easily extensible through development of custom components

• Data Provenance tracks data through entire system

[1] https://nifi.apache.org/

Dataflow

Apache NiFi

Page 13: Apache NiFi Crash Course Intro

13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Apache NiFiKey Features

• Guaranteed delivery• Data buffering

- Backpressure- Pressure release

• Prioritized queuing• Flow specific QoS

- Latency vs. throughput- Loss tolerance

• Data provenance• Supports push and pull

models

• Recovery/recording a rolling log of fine-grained history

• Visual command and control

• Flow templates• Pluggable/multi-role

security*• Designed for extension• Clustering

Page 14: Apache NiFi Crash Course Intro

14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Revisit: Courier service from the perspective of NiFi

Physical Store

Gateway Server

Mobile Devices

Registers

Server Cluster

Distribution Center Core Data Center at HQ

Server Cluster

Trucks Deliverers

Delivery Truck: Creative Stall, https://thenounproject.com/creativestall/Deliverer: Rigo Peter, https://thenounproject.com/rigo/Cash Register: Sergey Patutin, https://thenounproject.com/bdesign.by/Hand Scanner: Eric Pearson, https://thenounproject.com/epearson001/

NiFi NiFi NiFi NiFi NiFi NiFi

On Delivery Routes

Page 15: Apache NiFi Crash Course Intro

15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Courier service from the perspective of NiFi & MiNiFi

Physical Store

Gateway Server

Mobile Devices

Registers

Server Cluster

Distribution Center Core Data Center at HQ

Server Cluster

Trucks Deliverers

Delivery Truck: Creative Stall, https://thenounproject.com/creativestall/Deliverer: Rigo Peter, https://thenounproject.com/rigo/Cash Register: Sergey Patutin, https://thenounproject.com/bdesign.by/Hand Scanner: Eric Pearson, https://thenounproject.com/epearson001/

Client Libraries

Client Libraries

MiNiFi

MiNiFi NiFi NiFi NiFi NiFi NiFi NiFi

Client Libraries

On Delivery Routes

Page 16: Apache NiFi Crash Course Intro

16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Apache NiFi MiNiFiKey Features

• Guaranteed delivery• Data buffering

- Backpressure- Pressure release

• Prioritized queuing• Flow specific QoS

- Latency vs. throughput- Loss tolerance

• Data provenance

• Recovery/recording a rolling log of fine-grained history

• Designed for extension

• Design and Deploy• Warm re-deploys

Page 17: Apache NiFi Crash Course Intro

17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Apache NiFi MiNiFiKey Features

• Guaranteed delivery• Data buffering

- Backpressure- Pressure release

• Prioritized queuing• Flow specific QoS

- Latency vs. throughput- Loss tolerance

• Data provenance

• Recovery/recording a rolling log of fine-grained history

• Designed for extension

• Design and Deploy• Warm re-deploys

Page 18: Apache NiFi Crash Course Intro

18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Visual Command and Controlvs.

Design and Deploy

Page 19: Apache NiFi Crash Course Intro

19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Apache NiFi Managed DataflowSOURCES REGIONAL

INFRASTRUCTURECORE

INFRASTRUCTURE

Page 20: Apache NiFi Crash Course Intro

20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

NiFi is based on Flow Based Programming (FBP)FBP Term NiFi Term DescriptionInformation Packet

FlowFile Each object moving through the system.

Black Box FlowFile Processor

Performs the work, doing some combination of data routing, transformation, or mediation between systems.

Bounded Buffer

Connection The linkage between processors, acting as queues and allowing various processes to interact at differing rates.

Scheduler Flow Controller

Maintains the knowledge of how processes are connected, and manages the threads and allocations thereof which all processes use.

Subnet Process Group

A set of processes and their connections, which can receive and send data via ports. A process group allows creation of entirely new component simply by composition of its components.

Page 21: Apache NiFi Crash Course Intro

21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

FlowFiles & Data Agnosticism

NiFi is data agnostic! But, NiFi was designed understanding that users

can care about specifics and provides tooling

to interact with specific formats, protocols, etc.

ISO 8601 - http://xkcd.com/1179/

Robustness principle

Be conservative in what you do, be liberal in what you accept from others“

Page 22: Apache NiFi Crash Course Intro

22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

FlowFiles are like HTTP dataHTTP Data FlowFile

HTTP/1.1 200 OKDate: Sun, 10 Oct 2010 23:26:07 GMTServer: Apache/2.2.8 (CentOS) OpenSSL/0.9.8gLast-Modified: Sun, 26 Sep 2010 22:04:35 GMTETag: "45b6-834-49130cc1182c0"Accept-Ranges: bytesContent-Length: 13Connection: closeContent-Type: text/html

Hello world!

Standard FlowFile AttributesKey: 'entryDate’ Value: 'Fri Jun 17 17:15:04 EDT 2016'Key: 'lineageStartDate’ Value: 'Fri Jun 17 17:15:04 EDT 2016'Key: 'fileSize’ Value: '23609'FlowFile Attribute Map ContentKey: 'filename’Value: '15650246997242'Key: 'path’ Value: './’

Binary Content *

Header

Content

Page 23: Apache NiFi Crash Course Intro

23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

AgendaWhat is dataflow and what are the challenges?

Apache NiFi

Architecture

Live Demo

Community

Page 24: Apache NiFi Crash Course Intro

24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

OS/Host

JVM

Flow Controller

Web Server

Processor 1 Extension N

FlowFileRepository

ContentRepository

ProvenanceRepository

Local Storage

OS/Host

JVM

Flow Controller

Web Server

Processor 1 Extension N

FlowFileRepository

ContentRepository

ProvenanceRepository

Local Storage

NiFi Architecture 0.x line

OS/Host

JVM

Flow Controller

Web Server

Processor 1 Extension N

FlowFileRepository

ContentRepository

ProvenanceRepository

Local Storage

OS/Host

JVM

NiFi Cluster Manger – Request Replicator

Web Server

MasterNiFi Cluster Manager (NCM)

OS/Host

JVM

Flow Controller

Web Server

Processor 1 Extension N

FlowFileRepository

ContentRepository

ProvenanceRepository

Local Storage

SlavesNiFi Nodes

Page 25: Apache NiFi Crash Course Intro

25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

NiFi Architecture 1.x line

Page 26: Apache NiFi Crash Course Intro

26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

NiFi Architecture 1.x line

Page 27: Apache NiFi Crash Course Intro

27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

NiFi Architecture – Repositories - Pass by reference

FlowFile Content Provenance

F1 C1 C1 P1 F1

Excerpt of demo flow… What’s happening inside the repositories…

BEFORE

AFTER

F2 C1 C1 P3 F2 – Clone (F1)

F1 C1 P2 F1 – Route

P1 F1 – Create

Page 28: Apache NiFi Crash Course Intro

28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

NiFi Architecture – Repositories – Copy on Write

FlowFile Content Provenance

F1 C1 C1 P1 F1 - CREATE

Excerpt of demo flow… What’s happening inside the repositories…

BEFORE

AFTER

F1 C1

F1.1 C2C2 (encrypted)

C1 (plaintext)

P2 F1.1 - MODIFY

P1 F1 - CREATE

Page 29: Apache NiFi Crash Course Intro

29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

NiFi vs MiNiFi Java Processes

NiFi Framework

Components

MiNiFi

NiFi Framework

User Interface

Components

NiFi

Page 30: Apache NiFi Crash Course Intro

30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

NiFi Java Processes

Bootstrap

NiFi

UI

bootstrap.conf

nifi.properties

flow.xml.gzreads &modifies

reads

reads

starts

NiFi MiNiFi

Page 31: Apache NiFi Crash Course Intro

31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

MiNiFi Java Processes

MiNiFi

Bootstrap

ConfigurationChange Notifier(s)

bootstrap.conf

nifi.properties

flow.xml.gzreads

reads

starts

config.ymltransforms

reads

into

NiFi MiNiFi

Page 32: Apache NiFi Crash Course Intro

32 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Simple Config.ymlTail a rolling file -> Site to Site

Page 33: Apache NiFi Crash Course Intro

33 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Config Change Notifiers Two implementations

– RestChangeNotifier• Http(s)

– FileChangeNotifier

Configured in bootstrap.conf

Page 34: Apache NiFi Crash Course Intro

34 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Change notifier update

MiNiFi

Bootstrap

ConfigurationChange Notifiers

1. Initial state– Both running

Page 35: Apache NiFi Crash Course Intro

35 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Change notifier update

MiNiFi

Bootstrap

ConfigurationChange Notifiers

user creates new configuration2. User sends update through

notifier– HTTP(S) post request– Change watched file

Page 36: Apache NiFi Crash Course Intro

36 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Change notifier update

MiNiFi

Bootstrap

ConfigurationChange Notifiers

3. Bootstrap validation– Basic validation– Rest notifier will respond

accordingly– Results logged

validate new configuration

user creates new configuration

Page 37: Apache NiFi Crash Course Intro

37 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Change notifier update

MiNiFi

Bootstrap

ConfigurationChange Notifiers

config.ymlsaves new

4. Bootstrap saves and transforms

– Copy old config.yml to a swap file

validate new configuration

user creates new configuration

nifi.properties

flow.xml.gz

transforms into

Page 38: Apache NiFi Crash Course Intro

38 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Change notifier update

MiNiFi

Bootstrap

ConfigurationChange Notifiers

nifi.properties

flow.xml.gz

attempt restart

config.ymlsaves new

reads

transforms into

5. Bootstrap attempts restart– MiNiFi reads in the new

nifi.properties and flow.xml.gz

validate new configuration

user creates new configuration

Page 39: Apache NiFi Crash Course Intro

39 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Change notifier update

6. Success or Fail– Successful restart continue

processing– Failure, rollback to old

config– Existing Data is mapped or

orphaned

MiNiFi

Bootstrap

ConfigurationChange Notifiers

nifi.properties

flow.xml.gz

attempt restart

config.ymlsaves new

reads

transforms into

validate new configuration

user creates new configuration

Page 40: Apache NiFi Crash Course Intro

40 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

AgendaWhat is dataflow and what are the challenges?

Apache NiFi

Architecture

Demo

Community

Page 41: Apache NiFi Crash Course Intro

41 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

AgendaWhat is dataflow and what are the challenges?

Apache NiFi

Architecture

Demo

Community

Page 42: Apache NiFi Crash Course Intro

42 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Matured at NSA 2006-2014

Brief history of the Apache NiFi Community

• Contributors from Government and several commercial industries

• Releases on a 6-8 week schedule

Code developed at NSA

2006

Today

Achieved TLP

status in just 7 months

July 2015

Code available open source

ASL v2

November 2014

Page 43: Apache NiFi Crash Course Intro

43 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

MiNiFi Prospective Plans - Centralized Command and Control

Design at a centralized place, deploy on the edge– Flow deployment– NAR deployment– Agent deployment

Version control of flows Agent status monitoring Bi-directional command and control

Centralized management console with a UI

Page 44: Apache NiFi Crash Course Intro

44 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Hortonworks Dataflow

IngestionSimple Event Processing

EngineStream Processing

DestinationData Bus

Page 45: Apache NiFi Crash Course Intro

45 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Questions?

Page 46: Apache NiFi Crash Course Intro

46 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Learn more and join us!

Apache NiFi sitehttp://nifi.apache.org

Subproject MiNiFi sitehttp://nifi.apache.org/minifi/

Subscribe to and collaborate [email protected]@nifi.apache.org

Submit Ideas or Issueshttps://issues.apache.org/jira/browse/NIFI

Follow us on Twitter@apachenifi

Page 47: Apache NiFi Crash Course Intro

47 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Thank you!