mirror of https://github.com/apache/nifi.git
NIFI-973: Created a Getting Started Guide
Signed-off-by: Mark Payne <markap14@hotmail.com>
This commit is contained in:
parent
af19053a7f
commit
4c0cf7d72b
|
@ -0,0 +1,754 @@
|
|||
//
|
||||
// Licensed to the Apache Software Foundation (ASF) under one or more
|
||||
// contributor license agreements. See the NOTICE file distributed with
|
||||
// this work for additional information regarding copyright ownership.
|
||||
// The ASF licenses this file to You under the Apache License, Version 2.0
|
||||
// (the "License"); you may not use this file except in compliance with
|
||||
// the License. You may obtain a copy of the License at
|
||||
//
|
||||
// http://www.apache.org/licenses/LICENSE-2.0
|
||||
//
|
||||
// Unless required by applicable law or agreed to in writing, software
|
||||
// distributed under the License is distributed on an "AS IS" BASIS,
|
||||
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
// See the License for the specific language governing permissions and
|
||||
// limitations under the License.
|
||||
//
|
||||
Getting Started with Apache NiFi
|
||||
================================
|
||||
Apache NiFi Team <dev@nifi.apache.org>
|
||||
:homepage: http://nifi.apache.org
|
||||
|
||||
|
||||
Who is This Guide For?
|
||||
----------------------
|
||||
|
||||
This guide is written for user who have never used, have had limited exposure to, or only accomplished specific tasks within NiFi.
|
||||
This guide is not intended to be an exhaustive instruction manual or a reference guide. The
|
||||
link:user-guide.html[User Guide] provides a great deal of information
|
||||
and is intended to be a much more exhaustive resource and is very useful as a reference guide, as well.
|
||||
This guide, in comparison, is intended to provide users with just the information needed in order
|
||||
to understand how to work with NiFi in order to quickly and easily build powerful and agile dataflows.
|
||||
|
||||
Because some of the information in this guide is applicable only for first-time users while other
|
||||
information may be applicable for those who have used NiFi a bit, this guide is broken up into
|
||||
several different sections, some of which may not be useful for some readers. Feel free to jump to
|
||||
the sections that are most appropriate for you.
|
||||
|
||||
This guide does expect that the user has a basic understanding of what NiFi is and does not
|
||||
delve into this level of detail. This level of information can be found in the
|
||||
link:overview.html[Overview] documentation.
|
||||
|
||||
|
||||
|
||||
|
||||
Terminology Used in This Guide
|
||||
------------------------------
|
||||
|
||||
In order to talk about NiFi, there are a few key terms that readers should be familiar with.
|
||||
We will explain those NiFi-specific terms here, at a high level.
|
||||
|
||||
|
||||
*FlowFile*: Each piece of "User Data" (i.e., data that the user brings into NiFi for processing and distribution) is
|
||||
referred to as a FlowFile. A FlowFile is made up of two parts: Attributes and Content. The Content is the User Data
|
||||
itself. Attributes are key-value pairs that are associated with the User Data.
|
||||
|
||||
*Processor*: The Processor is the NiFi component that is responsible for creating, sending, receiving, transforming, routing,
|
||||
splitting, merging, and processing FlowFiles. It is the most important building block available to NiFi users to build their
|
||||
dataflows.
|
||||
|
||||
|
||||
Downloading and Installing NiFi
|
||||
-------------------------------
|
||||
|
||||
NiFi can be downloaded from the link:http://nifi.apache.org/download.html[NiFi Downloads Page]. There are two packaging options
|
||||
available: a "tarball" that is tailored more to Linux and a zip file that is more applicable for Windows users. Mac OSX users
|
||||
may also use the tarball or can install via Homebrew.
|
||||
|
||||
To install via Homebrew, simply run the command `brew install nifi`.
|
||||
|
||||
For users who are not running OSX or do not have Homebrew installed, after downloading the version of NiFi that you
|
||||
would like to use simply extract the archive to the location that you wish to run the application from.
|
||||
|
||||
For information on how to configure the instance of NiFi (for instance, to configure security, data storage
|
||||
configuration, or the port that NiFi is running on), see the link:administration-guide.html[Admin Guide].
|
||||
|
||||
|
||||
Starting NiFi
|
||||
-------------
|
||||
|
||||
Once NiFi has been downloaded and installed as described above, it can be started by using the mechanism
|
||||
appropriate for your operating system.
|
||||
|
||||
=== For Windows Users
|
||||
|
||||
For Windows users, navigate to the folder where NiFi was installed. Within this folder is a subfolder
|
||||
named `bin`. Navigate to this subfolder and double-click the `run-nifi.bat` file.
|
||||
|
||||
This will launch NiFi and leave it running in the foreground. To shut down NiFi, select the window that
|
||||
was launched and hold the Ctrl key while pressing C.
|
||||
|
||||
|
||||
=== For Linux/Mac OSX users
|
||||
|
||||
For Linux and OSX users, use a Terminal window to navigate to the directory where NiFi was installed.
|
||||
To run NiFi in the foreground, run `bin/nifi.sh run`. This will leave the application running until
|
||||
the user presses Ctrl-C. At that time, it will initiate shutdown of the application.
|
||||
|
||||
To run NiFi in the background, instead run `bin/nifi.sh start`. This will initiate the application to
|
||||
begin running. To check the status and see if NiFi is currently running, execute the command `bin/nifi.sh status`.
|
||||
NiFi can be shutdown by executing the command `bin/nifi.sh stop`.
|
||||
|
||||
|
||||
=== Installing as a Service
|
||||
|
||||
Currently, installing NiFi as a service is supported only for Linux and Mac OSX users. To install the application
|
||||
as a service, navigate to the installation directory in a Terminal window and execute the command `bin/nifi.sh install`
|
||||
to install the service with the default name `nifi`. To specify a custom name for the service, execute the command
|
||||
with an optional second argument that is the name of the service. For example, to install NiFi as a service with the
|
||||
name `dataflow`, use the command `bin/nifi.sh install dataflow`.
|
||||
|
||||
Once installed, the service can be started and stopped using the appropriate commands, such as `sudo service nifi start`
|
||||
and `sudo service nifi stop`. Additionally, the running status can be checked via `sudo service nifi status`.
|
||||
|
||||
|
||||
|
||||
I Started NiFi. Now What?
|
||||
-------------------------
|
||||
|
||||
Now that NiFi has been started, we can bring up the User Interface (UI) in order to create and monitor our dataflow.
|
||||
To get started, open a web browser and navigate to `http://localhost:8080/nifi`. The port can be changed by
|
||||
editing the `nifi.properties` file in the NiFi `conf` directory, but the default port is 8080.
|
||||
|
||||
This will bring up the User Interface, which at this point is a blank canvas for orchestrating a dataflow:
|
||||
|
||||
image:new-flow.png["New Flow"]
|
||||
|
||||
Near the top of the UI are a few toolbars that will be very important to create your first dataflow:
|
||||
|
||||
image:nifi-toolbar-components.png["Toolbar Components"]
|
||||
|
||||
|
||||
=== Adding a Processor
|
||||
|
||||
We can now begin creating our dataflow by adding a Processor to our canvas. To do this, drag the Processor icon
|
||||
image:iconProcessor.png["Processor"] from the top-left of the screen into the middle of the canvas (the graph paper-like
|
||||
background) and drop it there. This will give us a dialog that allows us to choose which Processor we want to add:
|
||||
|
||||
image:add-processor.png["Add Processor"]
|
||||
|
||||
We have quite a few options to choose from. For the sake of becoming oriented with the system, let's say that we
|
||||
just want to bring in files from our local disk into NiFi. When a developer creates a Processor, the developer can
|
||||
assign "tags" to that Processor. These can be thought of as keywords. We can filter by these tags or by Processor
|
||||
name by typing into the Filter box in the top-right corner of the dialog. Type in the keywords that you would think
|
||||
of when wanting to ingest files from a local disk. Typing in keyword "file", for instance, will provide us a few
|
||||
different Processors that deal with files. Filtering by the term "local" will narrow down the list pretty quickly,
|
||||
as well. If we select a Processor from the list,
|
||||
we will see a brief description of the Processor near the bottom of the dialog. This should tell us exactly what
|
||||
the Processor does. The description of the *GetFile* Processor tells us that it pulls data from our local disk
|
||||
into NiFi and then removes the local file. We can then double-click the Processor type or select it and choose the
|
||||
`Add` button. The Processor will be added to the canvas in the location that it was dropped.
|
||||
|
||||
=== Configuring a Processor
|
||||
|
||||
Now that we have added the GetFile Processor, we can configure it by right-clicking on the Processor and choosing
|
||||
the `Configure` menu item. The provided dialog allows us to configure many different options that can be read about
|
||||
in the link:user-guide.html[User Guide], but for the sake of this guide, we will focus on the Properties tab. Once
|
||||
the Properties tab has been selected, we are given a list of several different properties that we can configure
|
||||
for the Processor. The properties that are available depend on the type of Processor and are generally different
|
||||
for each type. Properties that are in bold are required properties. The Processor cannot be started until all required
|
||||
properties have been configured. The most important property to configure for GetFile is the directory from which
|
||||
to pick up files. If we set the directory name to `./data-in`, this will cause the Processor to start picking up
|
||||
any data in the `data-in` subdirectory of the NiFi Home directory. We can choose to configure several different
|
||||
Properties for this Processor. If unsure what a particular Property does, we can hover over the help icon (
|
||||
image:iconInfo.png["Help"]
|
||||
)
|
||||
next to the Property Name with the mouse in order to read a description of the property. Additionally, the
|
||||
tooltip that is displayed when hovering over the help icon will provide the default value for that property,
|
||||
if one exists, information about whether or not the property supports the Expression Language (see the
|
||||
<<ExpressionLanguage>> section below), and previously configured values for that property.
|
||||
|
||||
In order for this property to be valid, create a directory named `data-in` in the NiFi home directory and then
|
||||
click the `OK` button to close the dialog.
|
||||
|
||||
|
||||
=== Connecting Processors
|
||||
|
||||
Each Processor has a set of defined "Relationships" that it is able to send data to. When a Processor finishes handling
|
||||
a FlowFile, it transfers it to one of these Relationships. This allows a user to configure how to handle FlowFiles based
|
||||
on the result of Processing. For example, many Processor define two Relationships: `success` and `failure`. Users are
|
||||
then able to configure data to be routed through the flow one way if the Processor is able to successfully process
|
||||
the data and route the data through the flow in a completely different many if the Processor cannot process the
|
||||
data for some reason. Or, depending on the use case, the may simply route both relationships to the same route through
|
||||
the flow.
|
||||
|
||||
Now that we have added and configured our GetFile processor and applied the configuration, we can see in the
|
||||
top-left corner of the Processor an Alert icon (
|
||||
image:iconAlert.png[Alert]
|
||||
) signaling that the Processor is not in a valid state. Hovering over this icon, we can see that the `success`
|
||||
relationship has not been defined. This simply means that we have not told NiFi what to do with the data that the Processor
|
||||
transfers to the `success` Relationship.
|
||||
|
||||
In order to address this, let's add another Processor that we can connect the GetFile Processor to, by following
|
||||
the same steps above. This time, however, we will simply log the attributes that exist for the FlowFile. To do this,
|
||||
we will add a LogAttributes Processor.
|
||||
|
||||
We can now send the output of the GetFile Processor to the LogAttribute Processor. Hover over the GetFile Processor
|
||||
with the mouse and a Connection Icon (
|
||||
image:iconConnection.png[Connection]
|
||||
) will appear over the middle of the Processor. We can drag this icon from the GetFile Processor to the LogAttribute
|
||||
Processor. This gives us a dialog to choose which Relationships we want to include for this connection. Because GetFile
|
||||
has only a single Relationship, `success`, it is automatically selected for us.
|
||||
|
||||
Clicking on the Settings tab provides a handful of options for configuring how this Connection should behave:
|
||||
|
||||
image:connection-settings.png[Connection Settings]
|
||||
|
||||
We can give the Connection a name, if we like. Otherwise, the Connection name will be based on the selected Relationships.
|
||||
We can also set an expiration for the data. By default, it is set to "0 sec" which indicates that the data should not
|
||||
expire. However, we can change the value so that when data in this Connection reaches a certain age, it will automatically
|
||||
be deleted (and a corresponding EXPIRE Provenance event will be created).
|
||||
|
||||
The backpressure thresholds allow us to specify how full the queue is allowed to become before the source Processor is
|
||||
no longer scheduled to run. This allows us to handle cases where one Processor is capable of producing data faster than
|
||||
the next Processor is capable of consuming that data. If the backpressure is configured for each Connection along the way,
|
||||
the Processor that is bringing data into the system will eventually experience the backpressure and stop bringing in new
|
||||
data so that our system has the ability to recover.
|
||||
|
||||
Finally, we have the Prioritizers on the right-hand side. This allows us to control how the data in this queue is ordered.
|
||||
We can drag Prioritizers from the "Available prioritizers" list to the "Selected prioritizers" list in order to active
|
||||
the prioritizer. If multiple prioritizers are activated, they will be evaluated such that the Prioritizer listed first
|
||||
will be evaluated first and if two FlowFiles are determined to be equal according to that Prioritizers, the second Prioritizer
|
||||
will be used.
|
||||
|
||||
For the sake of this discuss, we can simply click `Add`. to add the Connection to our graph. We should now see that the Alert
|
||||
icon has changed to a Stopped icon (
|
||||
image:iconStop.png[Stopped]
|
||||
). The LogAttribute Processor, however, is now invalid because its `success` Relationship has not been connected to
|
||||
anything. Let's address this by signaling that data that is routed to `success` by LogAttribute should be "Auto Terminated,"
|
||||
meaning that NiFi should consider the FlowFile's processing complete and "drop" the data. To do this, we configure the
|
||||
LogAttribute Processor. On the Settings tab, in the right-hand side we can check the box next to the `success` Relationship
|
||||
to Auto Terminate the data. Clicking `OK` will close the dialog and show that both Processors are now stopped.
|
||||
|
||||
|
||||
=== Starting and Stopping Processors
|
||||
|
||||
At this point, we have two Processors on our graph, but nothing is happening. In order to start the Processors, we can
|
||||
click on each one individually and then right-click and choose the `Start` menu item. Alternatively, we can select the first
|
||||
Processor, and then hold the Shift key while selecting the other Processor in order to select both. Then, we can
|
||||
right-click and choose the `Start` menu item. As an alternative to using the context menu, we can select the Processors and
|
||||
then click the Start icon in the toolbar at the top of the screen.
|
||||
|
||||
Once started, the icon in the top-left corner of the Processors will change from a stopped icon to a Running icon. We can then
|
||||
stop the Processors in the same manner that we started them but using the Stop icon in the toolbar or the Stop menu item
|
||||
as opposed to the Start button.
|
||||
|
||||
Once a Processor has started, we are not able to configure it anymore. Instead, when we right-click on the Processor, we are
|
||||
given the option to view its current configuration. In order to configure a Processor, we must first stop the Processor and
|
||||
wait for any tasks that may be executing to finish. The number of tasks currently executing is shown in the top-right
|
||||
corner of the Processor, but nothing is shown there if there are currently no tasks.
|
||||
|
||||
|
||||
=== Getting More Info for a Processor
|
||||
|
||||
With each Processor having the ability to expose multiple different Properties and Relationships, it can become quite
|
||||
difficult to remember how all of the different pieces work for each Processor. To address this, you are able to right-click
|
||||
on a Processor and choose the `Usage` menu item. This will provide you with the Processor's usage information, such as a
|
||||
description of the Processor, the different Relationships that are available, when the different Relationships are used,
|
||||
Properties that are exposed by the Processor and their documentation, as well as which FlowFile Attributes (if any) are
|
||||
expected on incoming FlowFiles and which Attributes (if any) are added to outgoing FlowFiles.
|
||||
|
||||
|
||||
=== Other Components
|
||||
|
||||
The toolbar that provides users the ability to drag and drop Processors onto the graph includes several other components
|
||||
that can be used to build a dataflow. These components include Input and Output Ports, Funnels, Process Groups, and Remote
|
||||
Process Groups. Due to the intended scope of this document, we will not discuss these elements here, but information is
|
||||
readily available in the link:user-guide.html#building-dataflow[Building a Dataflow section] of the
|
||||
link:user-guide.html[User Guide].
|
||||
|
||||
|
||||
|
||||
What Processors are Available
|
||||
-----------------------------
|
||||
|
||||
In order to create an effective dataflow, the users must understand what types of Processors are available to them.
|
||||
NiFi contains many different Processors out of the box. These Processors provide capabilities to ingest data from
|
||||
numerous different systems, route, transform, process, split, and aggregate data, and distribute data to many systems.
|
||||
|
||||
The number of Processors that are available increases in nearly each release of NiFi. As a result, we will not attempt
|
||||
to name each of the Processors that are available, but we will highlight some of the most frequently used Processors,
|
||||
categorizing them by their functions.
|
||||
|
||||
=== Data Transformation
|
||||
- *CompressContent*: Compress or Decompress Content
|
||||
- *ConvertCharacterSet*: Convert the character set used to encode the content from one character set to another
|
||||
- *EncryptContent*: Encrypt or Decrypt Content
|
||||
- *ReplaceText*: Use Regular Expressions to modify textual Content
|
||||
- *TransformXml*: Apply an XSLT transform to XML Content
|
||||
|
||||
=== Routing and Mediation
|
||||
- *ControlRate*: Throttle the rate at which data can flow through one part of the flow
|
||||
- *DetectDuplicate*: Monitor for duplicate FlowFiles, based on some user-defined criteria. Often used in conjunction
|
||||
with HashContent
|
||||
- *DistributeLoad*: Load balance or sample data by distributing only a portion of data to each user-defined Relationship
|
||||
- *MonitorActivity*: Sends a notification when a user-defined period of time elapses without any data coming through a particular
|
||||
point in the flow. Optionally send a notification when dataflow resumes.
|
||||
- *RouteOnAttribute*: Route FlowFile based on the attributes that it contains.
|
||||
- *ScanAttribute*: Scans the user-defined set of Attributes on a FlowFile, checking to see if any of the Attributes match the terms
|
||||
found in a user-defined dictionary.
|
||||
- *RouteOnContent*: Search Content of a FlowFile to see if it matches any user-defined Regular Expression. If so, the FlowFile is
|
||||
routed to the configured Relationship.
|
||||
- *ScanContent*: Search Content of a FlowFile for terms that are present in a user-defined dictionary and route based on the
|
||||
presence or absence of those terms. The dictionary can consist of either textual entries or binary entries.
|
||||
- *ValidateXml*: Validation XML Content against an XML Schema; routes FlowFile based on whether or not the Content of the FlowFile
|
||||
is valid according to the user-defined XML Schema.
|
||||
|
||||
=== Database Access
|
||||
- *ConvertJSONToSQL*: Convert a JSON document into a SQL INSERT or UPDATE command that can then be passed to the PutSQL Processor
|
||||
- *ExecuteSQL*: Executes a user-defined SQL SELECT command, writing the results to a FlowFile in Avro format
|
||||
- *PutSQL*: Updates a database by executing the SQL DDM statement defined by the FlowFile's content
|
||||
|
||||
[[AttributeExtraction]]
|
||||
=== Attribute Extraction
|
||||
- *EvaluateJsonPath*: User supplies JSONPath Expressions (Similar to XPath, which is used for XML parsing/extraction), and these Expressions
|
||||
are then evaluated against the JSON Content to either replace the FlowFile Content or extract the value into the user-named Attribute.
|
||||
- *EvaluateXPath*: User supplies XPath Expressions, and these Expressions are then evaluated against the XML Content to either
|
||||
replace the FlowFile Content or extract the value into the user-named Attribute.
|
||||
- *EvaluateXQuery*: User supplies an XQuery query, and this query is then evaluated against the XML Content to either replace the FlowFile
|
||||
Content or extract the value into the user-named Attribute.
|
||||
- *ExtractText*: User supplies one or more Regular Expressions that are then evaluated against the textual content of the FlowFile, and the
|
||||
values that are extracted are then added as user-named Attributes.
|
||||
- *HashAttribute*: Performs a hashing function against the concatenation of a user-defined list of existing Attributes.
|
||||
- *HashContent*: Performs a hashing function against the content of a FlowFile and adds the hash value as an Attribute.
|
||||
- *IdentifyMimeType*: Evaluates the content of a FlowFile in order to determine what type of file the FlowFile encapsulates. This Processor is
|
||||
capable of detecting many different MIME Types, such as images, word processor documents, text, and compression formats just to name
|
||||
a few.
|
||||
- *UpdateAttribute*: Adds or updates any number of user-defined Attributes to a FlowFile. This is useful for adding statically configured values,
|
||||
as well as deriving Attribute values dynamically by using the Expression Language. This processor also provides an "Advanced User Interface,"
|
||||
allowing users to update Attributes conditionally, based on user-supplied rules.
|
||||
|
||||
=== System Interaction
|
||||
- *ExecuteProcess*: Runs the user-defined Operating System command. The Process's StdOut is redirected such that the content that is written
|
||||
to StdOut becomes the content of the outbound FlowFile. This Processor is a Source Processor - its output is expected to generate a new FlowFile,
|
||||
and the system call is expected to receive no input. In order to provide input to the process, use the ExecuteStreamCommand Processor.
|
||||
|
||||
- *ExecuteStreamCommand*: Runs the user-defined Operating System command. The contents of the FlowFile are optionally streamed to the StdIn
|
||||
of the process. The content that is written to StdOut becomes the content of hte outbound FlowFile. This Processor cannot be used a Source Processor -
|
||||
it must be fed incoming FlowFiles in order to perform its work. To perform the same type of functionality with a Source Processor, see the
|
||||
ExecuteProcess Processor.
|
||||
|
||||
=== Data Ingestion
|
||||
- *GetFile*: Streams the contents of a file from a local disk (or network-attached disk) into NiFi and then deletes the original file. This
|
||||
Processor is expected to move the file from one location to another location and is not to be used for copying the data.
|
||||
- *GetFTP*: Downloads the contents of a remote file via FTP into NiFi and then deletes the original file. This Processor is expected to move
|
||||
the data from one location to another location and is not to be used for copying the data.
|
||||
- *GetSFTP*: Downloads the contents of a remote file via SFTP into NiFi and then deletes the original file. This Processor is expected to move
|
||||
the data from one location to another location and is not to be used for copying the data.
|
||||
- *GetJMSQueue*: Downloads a message from a JMS Queue and creates a FlowFile based on the contents of the JMS message. The JMS Properties are
|
||||
optionally copied over as Attributes, as well.
|
||||
- *GetJMSTopic*: Downloads a message from a JMS Topic and creates a FlowFile based on the contents of the JMS message. The JMS Properties are
|
||||
optionally copied over as Attributes, as well. This Processor supports both durable and non-durable subscriptions.
|
||||
- *GetHTTP*: Downloads the contents of a remote HTTP- or HTTPS-based URL into NiFi. The Processor will remember the ETag and Last-Modified Date
|
||||
in order to ensure that the data is not continually ingested.
|
||||
- *ListenHTTP*: Starts an HTTP (or HTTPS) Server and listens for incoming connections. For any incoming POST request, the contents of the request
|
||||
are written out as a FlowFile, and a 200 response is returned.
|
||||
- *ListenUDP*: Listens for incoming UDP packets and creates a FlowFile per packet or per bundle of packets (depending on configuration) and
|
||||
emits the FlowFile to the 'success' relationship.
|
||||
- *GetHDFS*: Monitors a user-specified directory in HDFS. Whenever a new file enters HDFS, it is copied into NiFi and deleted from HDFS. This
|
||||
Processor is expected to move the file from one location to another location and is not to be used for copying the data. This Processor is also
|
||||
expected to be run On Primary Node only, if run within a cluster. In order to copy the data from HDFS and leave it in-tact, or to stream the data
|
||||
from multiple nodes in the cluster, see the ListHDFS Processor.
|
||||
- *ListHDFS* / *FetchHDFS*: ListHDFS monitors a user-specified directory in HDFS and emits a FlowFile containing the filename for each file that it
|
||||
encounters. It then persists this state across the entire NiFi cluster by way of a Distributed Cache. These FlowFiles can then be fanned out across
|
||||
the cluster and sent to the FetchHDFS Processor, which is responsible for fetching the actual content of those files and emitting FlowFiles that contain
|
||||
the content fetched from HDFS.
|
||||
- *FetchS3Object*: Fetches the contents of an object from the Amazon Web Services (AWS) Simple Storage Service (S3). The outbound FlowFile contains the contents
|
||||
received from S3.
|
||||
- *GetKafka*: Consumes messages from Apache Kafka. The messages can be emitted as a FlowFile per message or can be batched together using a user-specified
|
||||
delimiter.
|
||||
- *GetMongo*: Executes a user-specified query against MongoDB and writes the contents to a new FlowFile.
|
||||
- *GetTwitter*: Allows Users to register a filter to listen to the Twitter "garden hose" or Enterprise endpoint, creating a FlowFile for each tweet
|
||||
that is received.
|
||||
|
||||
=== Data Egress / Sending Data
|
||||
- *PutEmail*: Sends an E-mail to the configured recipients. The content of the FlowFile is optionally sent as an attachment.
|
||||
- *PutFile*: Writes the contents of a FlowFile to a directory on the local (or network attached) file system.
|
||||
- *PutFTP*: Copies the contents of a FlowFile to a remote FTP Server.
|
||||
- *PutSFTP*: Copies the contents of a FlowFile to a remote SFTP Server.
|
||||
- *PutJMS*: Sends the contents of a FlowFile as a JMS message to a JMS broker, optionally adding JMS Properties based on Attributes.
|
||||
- *PutSQL*: Executes the contents of a FlowFile as a SQL DDL Statement (INSERT, UPDATE, or DELETE). The contents of the FlowFile must be a valid
|
||||
SQL statement. Attributes can be used as parameters so that the contents of the FlowFile can be parameterized SQL statements in order to avoid
|
||||
SQL injection attacks.
|
||||
- *PutKafka*: Sends the contents of a FlowFile to Kafka as a message. The FlowFile can be sent as a single message or a delimiter, such as a
|
||||
new-line can be specified, in order to send many messages for a single FlowFile.
|
||||
- *PutMongo*: Sends the contents of a FlowFile to Mongo as an INSERT or an UPDATE.
|
||||
|
||||
=== Splitting and Aggregation
|
||||
- *SplitText*: SplitText takes in a single FlowFile whose contents are textual and splits it into 1 or more FlowFiles based on the configured
|
||||
number of lines. For example, the Processor can be configured to split a FlowFile into many FlowFiles, each of which is only 1 line.
|
||||
- *SplitJson*: Allows the user to split a JSON object that is comprised of an array or many child objects into a FlowFile per JSON element.
|
||||
- *SplitXml*: Allows the user to split an XML message into many FlowFiles, each containing a segment of the original. This is generally used when
|
||||
several XML elements have been joined together with a "wrapper" element. This Processor then allows those elements to be split out into individual
|
||||
XML elements.
|
||||
- *UnpackContent*: Unpacks different types of archive formats, such as ZIP and TAR. Each file within the archive is then transferred as a single
|
||||
FlowFile.
|
||||
- *MergeContent*: This Processor is responsible for merging many FlowFiles into a single FlowFile. The FlowFiles can be merged by concatenating their
|
||||
content together along with optional header, footer, and demarcator, or by specifying an archive format, such as ZIP or TAR. FlowFiles can be binned
|
||||
together based on a common attribute, or can be "defragmented" if they were split apart by some other Splitting process. The minimum and maximum
|
||||
size of each bin is user-specified, based on number of elements or total size of FlowFiles' contents, and an optional Timeout can be assigned as well
|
||||
so that FlowFiles will only wait for their bin to become full for a certain amount of time.
|
||||
- *SegmentContent*: Segments a FlowFile into potentially many smaller FlowFiles based on some configured data size. The splitting is not performed
|
||||
against any sort of demarcator but rather just based on byte offsets. This is used before transmitting FlowFiles in order to provide lower latency
|
||||
by sending many different pieces in parallel. On the other side, these FlowFiles can then be reassembled by the MergeContent processor using the
|
||||
Defragment mode.
|
||||
- *SplitContent*: Splits a single FlowFile into potentially many FlowFiles, similarly to SegmentContent. However, with SplitContent, the splitting
|
||||
is not performed on arbitrary byte boundaries but rather a byte sequence is specified on which to split the content.
|
||||
|
||||
=== HTTP
|
||||
- *GetHTTP*: Downloads the contents of a remote HTTP- or HTTPS-based URL into NiFi. The Processor will remember the ETag and Last-Modified Date
|
||||
in order to ensure that the data is not continually ingested.
|
||||
- *ListenHTTP*: Starts an HTTP (or HTTPS) Server and listens for incoming connections. For any incoming POST request, the contents of the request
|
||||
are written out as a FlowFile, and a 200 response is returned.
|
||||
- *InvokeHTTP*: Performs an HTTP Request that is configured by the user. This Processor is much more versatile than the GetHTTP and PostHTTP
|
||||
but requires a bit more configuration. This Processor cannot be used as a Source Processor and is required to have incoming FlowFiles in order
|
||||
to be triggered to perform its task.
|
||||
- *PostHTTP*: Performs an HTTP POST request, sending the contents of the FlowFile as the body of the message. This is often used in conjunction
|
||||
with ListenHTTP in order to transfer data between two different instances of NiFi in cases where Site-to-Site cannot be used (for instance,
|
||||
when the nodes cannot access each other directly and are able to communicate through an HTTP proxy).
|
||||
- *HandleHttpRequest* / *HandleHttpResponse*: The HandleHttpRequest Processor is a Source Processor that starts an embedded HTTP(S) server
|
||||
similarly to ListenHTTP. However, it does not send a response to the client. Instead, the FlowFile is sent out with the body of the HTTP request
|
||||
as its contents and attributes for all of the typical Servlet parameters, headers, etc. as Attributes. The HandleHttpResponse then is able to
|
||||
send a response back to the client after the FlowFile has finished being processed. These Processors are always expected to be used in conjunction
|
||||
with one another and allow the user to visually create a Web Service within NiFi. This is particularly useful for adding a front-end to a non-web-
|
||||
based protocol or to add a simple web service around some functionality that is already performed by NiFi, such as data format conversion.
|
||||
|
||||
=== Amazon Web Services
|
||||
- *FetchS3Object*: Fetches the content of an object stored in Amazon Simple Storage Service (S3). The content that is retrieved from S3
|
||||
is then written to the content of the FlowFile.
|
||||
- *PutS3Object*: Writes the contents of a FlowFile to an Amazon S3 object using the configured credentials, key, and bucket name.
|
||||
- *PutSNS*: Sends the contents of a FlowFile as a notification to the Amazon Simple Notification Service (SNS).
|
||||
- *GetSQS*: Pulls a message from the Amazon Simple Queuing Service (SQS) and writes the contents of the message to the content of the FlowFile.
|
||||
- *PutSQS*: Sends the contents of a FlowFile as a message to the Amazon Simple Queuing Service (SQS).
|
||||
- *DeleteSQS*: Deletes a message from the Amazon Simple Queuing Service (SQS). This can be used in conjunction with the GetSQS in order to receive
|
||||
a message from SQS, perform some processing on it, and then delete the object from the queue only after it has successfully completed processing.
|
||||
|
||||
|
||||
Working With Attributes
|
||||
-----------------------
|
||||
Each FlowFile is created with several Attributes, and these Attributes will change over the life of
|
||||
the FlowFile. The concept of a FlowFile is extremely powerful and provides three primary benefits.
|
||||
First, it allows the user to make routing decisions in the flow so that FlowFiles that meeting some criteria
|
||||
can be handled differently than other FlowFiles. This is done using the RouteOnAttribute and similar Processors.
|
||||
|
||||
Secondly, Attributes are used in order to configure Processors in such a way that the configurationg of the
|
||||
Processor is dependent on the data itself. For instance, the PutFile Processor is able to use the Attributes in order
|
||||
to know where to store each FlowFile, while the directory and filename Attributes may be different for each FlowFile.
|
||||
|
||||
Finally, the Attributes provide extremely valuable context about the data. This is useful when reviewing the Provenance
|
||||
data for a FlowFile. This allows the user to search for Provenance data that match specific criteria, and it also allows
|
||||
the user to view this context when inspecting the details of a Provenance Event. By doing this, the user is then able
|
||||
to gain valuable insight as to why the data was processed one way or another, simply by glancing at this context that is
|
||||
carried along with the content.
|
||||
|
||||
=== Common Attributes
|
||||
|
||||
Each FlowFile has a minimum set of Attributes:
|
||||
|
||||
- *filename*: A filename that can be used to store the data to a local or remote file system
|
||||
- *path*: The name of a directory that can be used to store the data to a local or remote file system
|
||||
- *uuid*: A Universally Unique Identifier that distinguishes the FlowFile from other FlowFiles in the system.
|
||||
- *entryDate*: The date and time at which the FlowFile entered the system (i.e., was created). The value of this
|
||||
attribute is a number that represents the number of milliseconds since midnight, Jan. 1, 1970 (UTC).
|
||||
- *lineageStartDate*: Any time that a FlowFile is cloned, merged, or split, this results in a "child" FlowFile being
|
||||
created. As those children are then cloned, merged, or split, a chain of ancestors is built. This value represents
|
||||
the date and time at which the oldest ancestor entered the system. Another way to think about this is that this
|
||||
attribute represents the latency of the FlowFile through the system. The value is a number that represents the number
|
||||
of milliseconds since midnight, Jan. 1, 1970 (UTC).
|
||||
- *fileSize*: This attribute represents the number of bytes taken up by the FlowFile's Content.
|
||||
|
||||
Note that the `uuid`, `entryDate`, `lineageStartDate`, and `fileSize` attributes are system-generated and cannot be changed.
|
||||
|
||||
=== Extracting Attributes
|
||||
|
||||
NiFi provides several different Processors out of the box for extracting Attributes from FlowFiles. A list of commonly used
|
||||
Processors for this purpose can be found above in the <<AttributeExtraction>> section. This is a very common use case for building
|
||||
custom Processors, as well. Many Processors are written to understand a specific data format and extract pertinent information from
|
||||
a FlowFile's content, creating Attributes to hold that information, so that decisions can then be made about how to route or
|
||||
process the data.
|
||||
|
||||
=== Adding User-Defined Attributes
|
||||
|
||||
In addition to having Processors that are able to extract particular pieces of information from FlowFile content into Attributes,
|
||||
it is also common for users to want to add their own user-defined Attributes to each FlowFile at a particular place in the flow.
|
||||
The UpdateAttribute Processor is designed specifically for this purpose. Users are able to add a new property to the Processor
|
||||
in the Configure dialog by clicking the "New Property" button in the top-right corner of the Properties tab. The user is then
|
||||
prompted to enter the name of the property and then a value. For each FlowFile that is processed by this UpdateAttribute
|
||||
Processor, an Attribute will be added for each user-defined property. The name of the Attribute will be the same as the name of
|
||||
the property that was added. The value of the Attribute will be the same as the value of the property.
|
||||
|
||||
The value of the property may contain the Expression Language, as well. This allows Attributes to be modified or added
|
||||
based on other Attributes. For example, if we want to prepend the hostname that is processing a file as well as the date to
|
||||
a filename, we could do this by adding a property with the name `filename` and the value `${hostname()}-${now():format('yyyy-dd-MM')}-${filename}`.
|
||||
While this may seem confusing at first, the section below on <<ExpressionLanguage>> will help to clear up what is going on
|
||||
here.
|
||||
|
||||
In addition to always adding a defined set of Attributes, the UpdateAttribute Processor has an Advanced UI that allows the user
|
||||
to configure a set of rules for which Attributes should be added when. To access this capability, in the Configure dialog's
|
||||
Properties tab, click the `Advanced...` button at the bottom of the dialog. This will provide a UI that is tailored specifically
|
||||
to this Processor, rather than the simple Properties table that is provided for all Processors. Within this UI, the user is able
|
||||
to configure a rules engine, essentially, specifying rules that must match in order to have the configured Attributes added
|
||||
to the FlowFile.
|
||||
|
||||
=== Routing on Attributes
|
||||
|
||||
One of the most powerful features of NiFi is the ability to route FlowFiles based on their Attributes. The primary mechanism
|
||||
for doing this is the RouteOnAttribute Processor. This Processor, like UpdateAttribute, is configured by adding user-defined properties.
|
||||
Any number of properties can be added by clicking the "New Property" icon in the top-right corner of the Properties tab in the
|
||||
Processor's Configure dialog.
|
||||
|
||||
Each FlowFile's Attributes will be compared against the configured properties to determine whether or not the FlowFile meets the
|
||||
specified criteria. The value of each property is expected to be an Expression Language expression and return a boolean value.
|
||||
For more on the Expression Language, see the <<ExpressionLanguage>> section below.
|
||||
|
||||
After evaluating the Expression Language expressions provided against the FlowFile's Attributes, the Processor determines how to
|
||||
route the FlowFile based on the Routing Strategy selected. The most common strategy is the "Route to Property name" strategy. With this
|
||||
strategy selected, the Processor will expose a Relationship for each property configured. If the FlowFile's Attributes satisfy the given
|
||||
expression, a copy of the FlowFile will be routed to the corresponding Relationship. For example, if we had a new property with the name
|
||||
"begins-with-r" and the value "${filename:startsWith(\'r')}" then any FlowFile whose filename starts with the letter 'r' will be routed
|
||||
to that Relationship. All other FlowFiles will be routed to 'unmatched'.
|
||||
|
||||
|
||||
[[ExpressionLanguage]]
|
||||
=== Expression Language / Using Attributes in Property Values
|
||||
|
||||
As we extract Attributes from FlowFiles' contents and add user-defined Attributes, they don't do us much good as an operator unless
|
||||
we have some mechanism by which we can use them. The NiFi Expression Language allows us to access and manipulate FlowFile Attribute
|
||||
values as we configure our flows. Not all Processor properties allow the Expression Language to be used, but many do. In order to
|
||||
determine whether or not a property supports the Expression Language, a user can hover over the Help icon (
|
||||
icon:iconInfo.png["Help Icon"]
|
||||
) in the Properties tab of the Processor Configure dialog. This will provide a tooltip that shows a description of the property, the
|
||||
default value, if any, and whether or not the property supports the Expression Language.
|
||||
|
||||
For properties that do support the Expression Language, it is used by adding an expression within the opening `${` tag and the closing
|
||||
`}` tag. An expression can be as simple as an attribute name. For example, to reference the `uuid` Attribute, we can simply use the
|
||||
value `${uuid}`. If the Attribute name begins with any character other than a letter, or if it contians a character other than
|
||||
a number, a letter, a period (.), or an underscore (_), the Attribute name will need to be quoted. For example, `${My Attribute Name}`
|
||||
will be invalid, but `${'My Attribute Name'}` will refer to the Attribute `My Attribute Name`.
|
||||
|
||||
In addition to referencing Attribute values, we can perform a number of functions and comparisons on those Attributes. For example,
|
||||
if we want to check if the `filename` attribute contains the letter 'r' without paying attention to case (upper case or lower case),
|
||||
we can do this by using the expression `${filename:toLower():contains('r')}`. Note here that the functions are separated by colons.
|
||||
We can chain together any number of functions to build up more complex expressions. It is also important to understand here that even
|
||||
though we are calling `filename:toLower()`, this does not alter the value of the `filename` Attribute in anyway but rather just gives
|
||||
us a new value to work with.
|
||||
|
||||
We can also embed one expression within another. For example, if we wanted to compare the value of the `attr1` Attribute to
|
||||
the value of the `attr2` Attribute, we can do this with the following expression: `${attr1:equals( ${attr2} )}`.
|
||||
|
||||
The Expression Language contains many different functions that can be used in order to perform the tasks needed for routing and manipulating
|
||||
Attributes. Functions exist for parsing and manipulating strings, comparing string and numeric values, manipulating and replacing values,
|
||||
and comparing values. A full explanation of the different functions available is out of scope of this document, but the
|
||||
link:expression-language-guide.html[Expression Language Guide] provides far greater detail for each of the functions.
|
||||
|
||||
In addition, this Expression Language guide is built in to the application so that users are able to easily see which functions are available
|
||||
and see their documentation while typing. When setting the value of a property that supports the Expression Language, if the cursor is within
|
||||
the Expression Language start and end tags, pressing Ctrl + Space on the keyword will provide a popup of all of the available functions and
|
||||
will provide auto-complete functionality. Clicking on or using the keyboard to navigate to one of the functions listed in the popup will
|
||||
cause a tooltip to show, which explains what the function does, the arguments that it expects, and the return type of the function.
|
||||
|
||||
|
||||
Working With Templates
|
||||
----------------------
|
||||
|
||||
As we use Processors to build more and more complex dataflows in NiFi, we often will find that we string together the same sequence
|
||||
of Processors to perform some task. This can become tedious and inefficient. To address this, NiFi provides a concept of Templates.
|
||||
A template can be thought of as a reusable sub-flow. To create a template, follow these steps:
|
||||
|
||||
- Select the components to include in the template. We can select multiple components by clicking on the first component and then holding
|
||||
the Shift key while selecting additional components (to include the Connections between those components), or by holding the Shift key
|
||||
while dragging a box around the desired components on the canvas.
|
||||
- Select the Create Template Icon (
|
||||
image:iconTemplate.png[Template Icon]
|
||||
) from the middle toolbar at the top of the screen.
|
||||
- Provide a name and optionally comments about the template.
|
||||
- Click the Create button.
|
||||
|
||||
Once we have created a template, we can now use it as a building block in our flow, just as we would a Processor. To do this, we will
|
||||
click and drag the Template icon from the left-most toolbar onto our canvas. We can then choose the template that we would like to add
|
||||
or our canvas and click the Add button.
|
||||
|
||||
Finally, we have the ability to manage our templates by using the Template Management dialog. To access this dialog, click the Template
|
||||
icon in the top-right toolbar. From here, we can see which templates exist and filter the templates to find the templates of interest.
|
||||
On the right-hand side of the table is icon to Export, or Download, the template as an XML file. This can then be provided to others so
|
||||
that they can use your template.
|
||||
|
||||
To import a template into your NiFi instance, click the Browse button in the top-right corner of the dialog and navigate to the file on
|
||||
your computer. Then click the Import button. The template will now show up in your table, and you can drag it onto your canvas as you would
|
||||
any other template that you have created.
|
||||
|
||||
There are a few important notes to remember when working with templates:
|
||||
- Any properties that are identified as being Sensitive Properties (such as a password that is configured in a Processor) will not be added
|
||||
to the template. These sensitive properties will have to be populated each time that the template is added to the canvas.
|
||||
- If a component that is included in the template references a Controller Service, the Controller Service will also be added to the template.
|
||||
This means that each time that the template is added to the graph, it will create a copy of the Controller Service.
|
||||
|
||||
|
||||
Monitoring NiFi
|
||||
---------------
|
||||
|
||||
As data flows through your dataflow in NiFi, it is important to understand how well your system is performing in order to assess if you
|
||||
will require more resources and in order to assess the health of your current resources. NiFi provides a few mechanisms for monitoring
|
||||
your system.
|
||||
|
||||
=== Status Bar
|
||||
|
||||
Near the top of the NiFi screen is a blue bar that is referred to as the Status Bar. It contains a few important statistics about the current
|
||||
health of NiFi. The number of Active Threads can indicate how hard NiFi is currently working, and the Queued stat indicates how many FlowFiles
|
||||
are currently queued across the entire flow, as well as the total size of those FlowFiles.
|
||||
|
||||
If the NiFi instance is in a cluster, we will also see an indicator here telling us how many nodes are in the cluster and how many are currently
|
||||
connected. In this case, the number of active threads and the queue size are indicative of all the sum of all nodes that are currently connected.
|
||||
|
||||
=== Component Statistics
|
||||
|
||||
Each Processor, Process Group, and Remote Process Group on the canvas provides several statistics about how much data has been processed
|
||||
by the component. These statistics provide information about how much data has been processed in the past five minutes. This is a rolling
|
||||
window and allows us to see things like the number of FlowFiles that have been consumed by a Processor, as well as the number of FlowFiles
|
||||
that have been emitted by the Processor.
|
||||
|
||||
The connections between Processors also expose the number of items that are currently queued.
|
||||
|
||||
It may also be valuable to see historical values for these metrics and, if clustered, how the different nodes compare to one another.
|
||||
In order to see this information, we can right-click on a component and choose the Status menu item. This will show us a graph that spans
|
||||
the time since NiFi was started, or up to 24 hours, whichever is less. The amount of time that is shown here can be extended or reduced
|
||||
by changing the configuration in the properties file.
|
||||
|
||||
In the top-right corner is a drop-down that allows the user to select which metric they are viewing. The graph on the bottom allows the
|
||||
user to select a smaller portion of the graph to zoom in.
|
||||
|
||||
|
||||
=== Bulletins
|
||||
|
||||
In addition to the statistics provided by each component, as a user we will want to know if any problems occur. While we could monitor the
|
||||
logs for anything interesting, it is much more convenient to have notifications pop up on the screen. If a Processor logs
|
||||
anything as a WARNING or ERROR, we will see a "Bulletin Indicator" show up in the top-left-hand corner of the Processor. This indicator
|
||||
looks like a sticky note and will be shown for five minutes after the event occurs. Hovering over the bulletin provides information about
|
||||
what happened so that the user does not have to sift through log messages to find it. If in a cluster, the bulletin will also indicate which
|
||||
node in the cluster emitted the bulletin. We can also change the log level at which bulletins will occur in the Settings tab of the Configure
|
||||
dialog for a Processor.
|
||||
|
||||
If the framework emits a bulletin, we will also see this bulletin indicator occur in the Status Bar at the top of the screen.
|
||||
The right-most icon in the NiFi Toolbar is the Bulletin Board icon. Clicking this icon will take us to the bulletin board where
|
||||
we can see all bulletins that occur across the NiFi instance and can filter based on the component, the message, etc.
|
||||
|
||||
|
||||
Data Provenance
|
||||
---------------
|
||||
|
||||
NiFi keeps a very granular level of detail about each piece of data that it ingests. As the data is processed through
|
||||
the system and is transformed, routed, split, aggregated, and distributed to other endpoints, this information is
|
||||
all stored within NiFi's Provenance Repository. In order to search and view this information, we can click the Data Provenance icon (
|
||||
image:iconProvenance.png[Data Provenance, width=28]) in the top-right corner of the canvas. This will provide us a table that lists
|
||||
the Provenance events that we have searched for:
|
||||
|
||||
image:provenance-table.png[Provenance Table]
|
||||
|
||||
Initially, this table is populated with the most recent 1,000 Provenance Events that have occurred (though it may take a few
|
||||
seconds for the information to be processed after the events occur). From this dialog, there is a Search button that allows the
|
||||
user to search for events that happened by a particular Processor, for a particular FlowFile by filename or UUID, or several other
|
||||
fields. The `nifi.properties` file provides the ability to configure which of these properties are indexed, or made searchable.
|
||||
Additionally, the properties file also allows you to choose specific FlowFile Attributes that will be indexed. As a result, you can
|
||||
choose which Attributes will be important to your specific dataflows and make those Attributes searchable.
|
||||
|
||||
[[EventDetails]]
|
||||
=== Event Details
|
||||
Once we have performed our search, our table will be populated only with the events that match the search criteria. From here, we
|
||||
can choose the Info icon (
|
||||
image:iconInfo.png[Info Icon]
|
||||
) on the left-hand side of the table to view the details of that event:
|
||||
|
||||
image:event-details.png[Event Details]
|
||||
|
||||
From here, we can see exactly when that event occurred, which FlowFile the event affected, which component (Processor, etc.) performed the event,
|
||||
how long the event took, and the overall time that the data had been in NiFi when the event occurred (total latency).
|
||||
|
||||
The next tab provides a listing of all Attributes that existed on the FlowFile at the time that the event occurred:
|
||||
|
||||
image:event-attributes.png[Event Attributes]
|
||||
|
||||
From here, we can see all the Attributes that existed on the FlowFile when the event occurred, as well as the previous values for those
|
||||
Attributes. This allows us to know which Attributes changed as a result of this event and how they changed. Additionally, in the right-hand
|
||||
corner is a checkbox that allows the user to see only those Attributes that changed. This may not be particularly useful if the FlowFile has
|
||||
only a handful of Attributes, but can be very helpful when a FlowFile has hundreds of Attributes.
|
||||
|
||||
This is very important, because it allows the user to understand the exactly context in which the FlowFile was processed. This is very helpful
|
||||
to understand 'why' the FlowFile was processed the way that it was, especially when the Processor was configured using the Expression Language.
|
||||
|
||||
Finally, we have the Content tab:
|
||||
|
||||
image:event-content.png[Event Content]
|
||||
|
||||
This tab provides us information about where in the Content Repository the FlowFile's content was stored. If the event modified the content
|
||||
of the FlowFile, we will see the 'before' and 'after' content claims. We are then given the option to Download the content or to View the
|
||||
content within NiFi itself, if the data format is one that NiFi understands how to render.
|
||||
|
||||
Additionally, there is 'Replay' button that allows the user to re-insert the FlowFile into the flow and re-process it from exactly the point
|
||||
at which the event happened. This provides a very powerful mechanism, as we are able to modify our flow in real time, re-process a FlowFile,
|
||||
and then view the results. If they are not as expected, we can modify the flow again, and re-process the FlowFile again. We are able to perform
|
||||
this iterative development of the flow until it is processing the data exactly as intended.
|
||||
|
||||
=== Lineage Graph
|
||||
|
||||
In addition to viewing the details of a Provenance event, we can also view the lineage of the FlowFile involved by clicking on the Lineage Icon (
|
||||
image:iconLineage.png[Lineage]
|
||||
) from the table view.
|
||||
|
||||
This provides us with a graphical representation of exactly what happened to that piece of data as it traversed the system:
|
||||
|
||||
image:lineage-graph-annotated.png[Lineage Graph]
|
||||
|
||||
From here, we can right-click on any of the events represented and click the "View Details" menu item to see the <<EventDetails>>.
|
||||
This graphical representation shows us exactly which events occurred to the data. There are a view "special" event types to be
|
||||
aware of. If we see a JOIN, FORK, or CLONE event, we can right-click and choose to Find Parents or Expand. This allows us to
|
||||
see the lineage of parent FlowFiles and children FlowFiles that were created as well.
|
||||
|
||||
The slider in the bottom-left corner allows us to see the time at which these events occurred. By sliding it left and right, we can
|
||||
see which events introduced latency into the system so that we have a very good understanding of where in our system we may need to
|
||||
provide more resources, such as the number of Concurrent Tasks for a Processor. Or it may reveal, for example, that most of the latency
|
||||
was introduced by a JOIN event, in which we were waiting for more FlowFiles to join together. In either case, the ability to easily
|
||||
see where this is occurring is a very powerful feature that will help users to understand how the enterprise is operating.
|
||||
|
||||
|
||||
Where To Go For More Information
|
||||
--------------------------------
|
||||
|
||||
The NiFi community has built up a significant amount of documentation on how to use the software. The following guides are available, in
|
||||
addition to this Getting Started Guide:
|
||||
|
||||
- link:overview.html[Apache NiFi Overview] - Provides an overview of what Apache NiFi is, what it does, and why it was created.
|
||||
- link:user-guide.html[Apache NiFi User Guide] - A fairly extensive guide that is often used more as a Reference Guide, as it has pretty
|
||||
lengthy in discussing all of the different components that comprise the application. This guide is written with the NiFi Operator as its
|
||||
audience. It provides information on each of the different components available in NiFi and explains how to use the different features
|
||||
provided by the application.
|
||||
- link:administration-guide.html[Administration Guide] - A guide for setting up and administering Apache NiFi for production environments.
|
||||
This guide provides information about the different system-level settings, such as setting up clusters of NiFi and securing access to the
|
||||
web UI and data.
|
||||
- link:expression-language-guide.html[Express Language Guide] - A far more exhaustive guide for understanding the Expression Language than
|
||||
is provided above. This guide is the definitive documentation for the NiFi Expression Language. It provides an introduction to the EL
|
||||
and an explanation of each function, its arguments, and return types as well as providing examples.
|
||||
- link:developer-guide.html[Developer's Guide] - While not an exhaustive guide to All Things NiFi Development, this guide does provide a
|
||||
comprehensive overview of the different API's available and how they should be used. In addition, it provides Best Practices for developing
|
||||
NiFi components and common Processor idioms to help aid in understanding the logic behind many of the existing NiFi components.
|
||||
- link:https://cwiki.apache.org/confluence/display/NIFI/Contributor+Guide[Contributor's Guide] - A guide for explaining how to contribute
|
||||
work back to the Apache NiFi community so that others can make use of it.
|
||||
|
||||
Several blog postings have also been added to the Apache NiFi blog site:
|
||||
link:https://blogs.apache.org/nifi/[https://blogs.apache.org/nifi/]
|
||||
|
||||
In addition to the blog and guides provided here, you can browse the different
|
||||
link:https://nifi.apache.org/mailing_lists.html[NiFi Mailing Lists] or send an e-mail to one of the mailing lists at
|
||||
link:mailto:users@nifi.apache.org[users@nifi.apache.org] or
|
||||
link:mailto:dev@nifi.apache.org[dev@nifi.apache.org].
|
||||
|
||||
Many of the members of the NiFi community are also available on Twitter and actively monitor for tweets that mention @apachenifi.
|
Binary file not shown.
Before Width: | Height: | Size: 31 KiB After Width: | Height: | Size: 90 KiB |
|
@ -155,7 +155,7 @@ image::status-bar.png["NiFi Status Bar"]
|
|||
|
||||
|
||||
|
||||
|
||||
[[building-dataflow]]
|
||||
Building a DataFlow
|
||||
-------------------
|
||||
|
||||
|
|
Loading…
Reference in New Issue