mirror of https://github.com/apache/nifi.git
Merge branch 'develop' of https://git-wip-us.apache.org/repos/asf/incubator-nifi into develop
This commit is contained in:
commit
40890c9aec
|
@ -16,6 +16,8 @@
|
||||||
//
|
//
|
||||||
NiFi System Administrator's Guide
|
NiFi System Administrator's Guide
|
||||||
=================================
|
=================================
|
||||||
|
Apache NiFi Team <dev@nifi.incubator.apache.org>
|
||||||
|
:homepage: http://nifi.incubator.apache.org
|
||||||
|
|
||||||
How to install
|
How to install
|
||||||
--------------
|
--------------
|
||||||
|
|
|
@ -16,6 +16,8 @@
|
||||||
//
|
//
|
||||||
NiFi Developer's Guide
|
NiFi Developer's Guide
|
||||||
======================
|
======================
|
||||||
|
Apache NiFi Team <dev@nifi.incubator.apache.org>
|
||||||
|
:homepage: http://nifi.incubator.apache.org
|
||||||
|
|
||||||
The designed points of extension
|
The designed points of extension
|
||||||
--------------------------------
|
--------------------------------
|
||||||
|
|
|
@ -14,18 +14,118 @@
|
||||||
// See the License for the specific language governing permissions and
|
// See the License for the specific language governing permissions and
|
||||||
// limitations under the License.
|
// limitations under the License.
|
||||||
//
|
//
|
||||||
NiFi Overview
|
Apache NiFi Overview
|
||||||
=============
|
====================
|
||||||
|
Apache NiFi Team <dev@nifi.incubator.apache.org>
|
||||||
|
:homepage: http://nifi.incubator.apache.org
|
||||||
|
|
||||||
The problem NiFi solves
|
What is Apache NiFi?
|
||||||
-----------------------
|
--------------------
|
||||||
Dataflow at scale...
|
Put simply NiFi was built to automate the flow of data between systems. While
|
||||||
|
the term 'dataflow' is used in a variety of contexts we'll use it here
|
||||||
|
to mean the automated and managed flow of information between systems. This
|
||||||
|
problem space has been around ever since enterprises had more than one system
|
||||||
|
where some of the systems created data and some of the systems consumed data.
|
||||||
|
The problems and solution patterns that emerged have been discussed and
|
||||||
|
articulated extensively. A comprehensive and readily consumed form is found in
|
||||||
|
the _Enterprise Integration Patterns_ <<eip>>.
|
||||||
|
|
||||||
The design philosophy of NiFi
|
Over the years dataflow has been one of those necessary evils in an
|
||||||
-----------------------------
|
architecture. Now though there are a number of active and rapidly evolving
|
||||||
FBP, ...
|
movements making dataflow a lot more interesting and a lot more vital to the
|
||||||
|
success of a given enterprise. These include things like; Service Oriented
|
||||||
|
Architecture <<soa>>, the rise of the API <<api>><<api2>>, Internet of Things <<iot>>,
|
||||||
|
and Big Data <<bigdata>>. In addition, the level of rigor necessary for
|
||||||
|
compliance, privacy, and security is constantly on the rise. Even still with
|
||||||
|
all of these new concepts coming about the patterns and needs of dataflow is
|
||||||
|
still largely the same. The primary differences then are the scope of
|
||||||
|
complexity, the rate of change necessary to adapt, and that at scale
|
||||||
|
the edge case becomes common occurrence. NiFi is built to help tackle these
|
||||||
|
modern dataflow challenges.
|
||||||
|
|
||||||
Key Features
|
The core concepts of NiFi
|
||||||
------------
|
-------------------------
|
||||||
UI, compponent-based, high performance, provenance
|
|
||||||
|
|
||||||
|
NiFi's fundamental design concepts closely relate to the main ideas of Flow Based
|
||||||
|
Programming <<fbp>>. Here are some of
|
||||||
|
the main NiFi concepts and how they map to FBP:
|
||||||
|
[grid="rows"]
|
||||||
|
[options="header",cols="3,3,10"]
|
||||||
|
|===========================
|
||||||
|
| NiFi Term | FBP Term| Description
|
||||||
|
|
||||||
|
| FlowFile | Information Packet |
|
||||||
|
A FlowFile represents the objects moving through the system and for each one NiFi
|
||||||
|
keeps track of a Map of key/value pair attribute strings and its associated
|
||||||
|
content zero or bytes.
|
||||||
|
|
||||||
|
| FlowFile Processor | Black Box |
|
||||||
|
Processors are what actually performs work. In <<eip>> terms a processor is
|
||||||
|
doing some combination of data Routing, Transformation, or mediation between
|
||||||
|
systems. Processors have access to attributes of a given flow file and its
|
||||||
|
content stream. Processors can operate on zero or more FlowFiles in a given unit of work
|
||||||
|
and either commit that work or rollback.
|
||||||
|
|
||||||
|
| Connection | Bounded Buffer |
|
||||||
|
Connections provide the actual linkage between processors. These act as queues
|
||||||
|
and allow various processes to interact at differing rates. These queues then
|
||||||
|
can be prioritized dynamically and can have upper bounds on load which enables
|
||||||
|
back pressure.
|
||||||
|
|
||||||
|
| Flow Controller | Scheduler |
|
||||||
|
The Flow Controller maintains the knowledge of how processes actually connect
|
||||||
|
and manages the threads and allocations thereof which all processes use. The
|
||||||
|
Flow Controller acts as the broker facilitating the exchange of FlowFiles
|
||||||
|
between processors.
|
||||||
|
|
||||||
|
| Process Group | subnet |
|
||||||
|
A Process Group is a specific set of processes and their connections which can
|
||||||
|
receive data via input ports and which can send data out via output ports. In
|
||||||
|
this manner process groups allow creation of entirely new components simply by
|
||||||
|
composition of other components.
|
||||||
|
|
||||||
|
|===========================
|
||||||
|
|
||||||
|
This design model, also similar to <<seda>>, provides many beneficial consequences which help NiFi
|
||||||
|
to be a very effective platform for building powerful and scalable dataflows.
|
||||||
|
A few of these benefits include:
|
||||||
|
|
||||||
|
* Lends well to visual creation and management of directed graphs of processors
|
||||||
|
* Is inherently asynchronous which allows for very high throughput and natural buffering even as processing and flow rates fluctuate
|
||||||
|
* Provides a highly concurrent model without a developer having to worry about the typical complexities of concurrency
|
||||||
|
* Promotes the development of cohesive and loosely coupled components which can then be reused in other contexts and promotes testable units
|
||||||
|
* The resource constrained connections make critical functions such as back-pressure and pressure release very natural and intuitive
|
||||||
|
* Error handling becomes as natural as the happy-path rather than a coarse grained catch-all
|
||||||
|
* The points at which data enters and exits the system as well as how it flows through are well understood and easily tracked
|
||||||
|
|
||||||
|
Dataflow Challenges : NiFi Features
|
||||||
|
-----------------------------------
|
||||||
|
* Systems fail
|
||||||
|
** Explanation: Networks fail, disks fail, software crashes, people make mistakes.
|
||||||
|
** Features: Fault-tolerance, buffering, durability, flow-specific QoS, data provenance, recovery/go back in time, visual command and control
|
||||||
|
* Data access exceeds capacity to consume
|
||||||
|
** Explanation: Sometimes a given data source can outpace some part of the processing or delivery chain - it only takes one weak-link to have an issue.
|
||||||
|
** Features: Prioritization, Back-pressure, congestion-avoidance, QoS (some things are critical and some are not)
|
||||||
|
* Boundary conditions are mere suggestions
|
||||||
|
** Explanation: You will get data that is too big, too small, too fast, too slow, corrupt, wrong, wrong format
|
||||||
|
** Features: flow-specific latency vs throughput tradeoffs, flow specific loss tolerance vs guaranteed delivery, extensible transformations
|
||||||
|
* What is noise one day becomes signal the next
|
||||||
|
** Explanation: Priorities of an organization change - rapidly. Enabling new flows and changing existing ones must be fast.
|
||||||
|
** Features: Dynamic prioritization of data. Go back in time (rolling buffer of recorded history). Real-time visual command and control. Changes are immediate and fine-grained.
|
||||||
|
* Compliance and security
|
||||||
|
** Explanation: Laws and regulations change. Business to business agreements change. System to system and system to user interactions must be secure and trusted.
|
||||||
|
** Features: 2-Way SSL. Pluggable authentication and authorization. Data provenance.
|
||||||
|
* Continuous improvement occurs in production
|
||||||
|
** Explanation: It is often not possible to come even close to replicating production environments in the lab.
|
||||||
|
** Features: Flow-specific QoS. Cheap copy-on-write. Data provenance. It is safe to tee a flow to an unreliable or non-production system.
|
||||||
|
|
||||||
|
# References
|
||||||
|
[bibliography]
|
||||||
|
- [[[eip]]] Gregor Hohpe. Enterprise Integration Patterns [online]. Retrieved: 27 Dec 2014, from: http://www.enterpriseintegrationpatterns.com/
|
||||||
|
- [[[soa]]] Wikipedia. Service Oriented Architecture [online]. Retrieved: 27 Dec 2014, from: http://en.wikipedia.org/wiki/Service-oriented_architecture
|
||||||
|
- [[[api]]] Eric Savitz. Welcome to the API Economy [online]. Forbes.com. Retrieved: 27 Dec 2014, from: http://www.forbes.com/sites/ciocentral/2012/08/29/welcome-to-the-api-economy/
|
||||||
|
- [[[api2]]] Adam Duvander. The rise of the API economy and consumer-led ecosystems [online]. thenextweb.com. Retrieved: 27 Dec 2014, from: http://thenextweb.com/dd/2014/03/28/api-economy/
|
||||||
|
- [[[iot]]] Wikipedia. Internet of Things [online]. Retrieved: 27 Dec 2014, from: http://en.wikipedia.org/wiki/Internet_of_Things
|
||||||
|
- [[[bigdata]]] Wikipedia. Big Data [online]. Retrieved: 27 Dec 2014, from: http://en.wikipedia.org/wiki/Big_data
|
||||||
|
- [[[fbp]]] Wikipedia. Flow Based Programming [online]. Retrieved: 28 Dec 2014, from: http://en.wikipedia.org/wiki/Flow-based_programming#Concepts
|
||||||
|
- [[[seda]]] Matt Welsh. Harvard. SEDA: An Architecture for Highly Concurrent Server Applications [online]. Retrieved: 28 Dec 2014, from: http://www.eecs.harvard.edu/~mdw/proj/seda/
|
|
@ -16,6 +16,8 @@
|
||||||
//
|
//
|
||||||
NiFi User Guide
|
NiFi User Guide
|
||||||
===============
|
===============
|
||||||
|
Apache NiFi Team <dev@nifi.incubator.apache.org>
|
||||||
|
:homepage: http://nifi.incubator.apache.org
|
||||||
|
|
||||||
[template="glossary", id="terminology"]
|
[template="glossary", id="terminology"]
|
||||||
Terminology
|
Terminology
|
||||||
|
|
3
pom.xml
3
pom.xml
|
@ -12,7 +12,8 @@
|
||||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
See the License for the specific language governing permissions and
|
See the License for the specific language governing permissions and
|
||||||
limitations under the License.
|
limitations under the License.
|
||||||
--><project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
|
-->
|
||||||
|
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
|
||||||
<modelVersion>4.0.0</modelVersion>
|
<modelVersion>4.0.0</modelVersion>
|
||||||
<parent>
|
<parent>
|
||||||
<groupId>org.apache</groupId>
|
<groupId>org.apache</groupId>
|
||||||
|
|
Loading…
Reference in New Issue