mission to nars with apache nifi
TRANSCRIPT
1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Mission to NARs with Apache NiFiAldrin Piri - @aldrinpiriApacheCon Big Data 201612 May 2016
2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Tutorial Resourceshttps://github.com/apiri/nifi-mission-to-nars-workshop
3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Agenda• Start with a dataflow… but we can do better!
• Do better with the NiFi Framework and custom processor• Extension Points: Processors, Controller Services, Reporting Tasks• Process Session & Process Context• How the API ties to the NiFi repositories
• Testing isn’t that bad!
• Share with templates!
4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Adding new functionality and development approach
Extending the platform is about leveraging expansive Java ecosystem and existing code
– Make use of open source projects and provided libraries for targeted systems and services
– Reuse existing, proprietary or closed source libraries and wrap their functionality in the framework
Test framework provides powerful means of testing extensions in isolation as they would work in a live instance
Deployment is as simple as copying the created NAR to your instance(s) lib directory
5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Minimal Dependencies Needed
Java Development Kit, version 1.7 or later Maven, version 3.1.0+
6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Boilerplate Code is provided via Maven Archetype
Support for creating bundles of major extension points of Processors and Controller Services– Processor Bundle
– Controller Service Bundle
7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
What is a NAR?
– Bundles the developed code to provide extensions and their dependencies
– Allows extension classloader isolation, aiding in versioning issues that can be pervasive in interacting with a wide variety of systems, services, and formats
NAR == NiFi ARchive
Consider it to be an OSGi-lite package
NAR Bundle Structure
8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
How long does it take to create an extension?
Incorporating functionality from an existing library– Create a bundle– Include a dependency to the library– Design User Experience
• Properties – How can this extension be configured? What are valid values for user input?• Relationships – How will data move to the next stage of its processing?
– Wrap the core classes of the library in the framework and implement onTrigger• ProcessSession abstracts interactions with backing repositories and handles unit-of-work sessions• ProcessContext allows accessing defined properties which the framework has validated
– Test– Deploy
For the majority of cases, development time is measured in hours*
9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
How long does it really take to create an extension?
Increased development effort may be needed for handling specific protocols
– Driven through manual management of sessions, when there are resources with their own lifecycles beyond the sole onTrigger method
– Common for protocol “Listeners”
For the majority of cases, development time is still measured in hours
10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Behind the Scenes
11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Architecture
OS/Host
JVM
Flow Controller
Web Server
Processor 1 Extension N
FlowFileRepository
ContentRepository
ProvenanceRepository
Local Storage
12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
NiFi Architecture – Repositories - Pass by reference
FlowFile Content Provenance
F1 C1 C1 P1 F1
BEFORE
AFTER
F2 C1 C1 P3 F2 – Clone (F1)
F1 C1 P2 F1 – Route
P1 F1 – Create
13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
NiFi Architecture – Repositories – Copy on Write
FlowFile Content Provenance
F1 C1 C1 P1 F1 - CREATE
BEFORE
AFTER
F1 C1
F1.1 C2C2 (encrypted)
C1 (plaintext)
P2 F1.1 - MODIFY
P1 F1 - CREATE
14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Quick (and dirty?) Prototyping
15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Prototype Dataflows Using Existing Binaries/Applications
ExecuteProcess – Acts as a source processor, creating FlowFiles containing data written to STDOUT by the target application
ExecuteStreamCommand – Provides content of FlowFiles to an external application via STDIN and creates FlowFiles containing data written STDOUT
Processors allow making external calls to applications and programs outside of the JVM
16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Increased Flexibility of Prototyping via Scripting Languages
ExecuteScript– Acts as a source processor, creating FlowFiles containing data from a referenced Script
InvokeScriptedProcessor – Provides access to the core framework API for interacting with NiFi like a native Java processor
Processors allow using JVM friendly interpreted languages
17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
ResourcesDeveloper Guide
– http://nifi.apache.org/developer-guide.html
Apache NiFi Maven Archetypes– https://cwiki.apache.org/confluence/display/NIFI/
Maven+Projects+for+Extensions
Mission to NARs with Apache NiFi sample bundle– https://github.com/apiri/nifi-mission-to-nars-workshop
18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Thanks for hanging out!