NIFI-973: Created a Getting Started Guide

Signed-off-by: Mark Payne <markap14@hotmail.com>
2015-09-17 13:08:39 -04:00 · 2015-09-17 13:08:39 -04:00 · 4c0cf7d72b
parent af19053a7f
commit 4c0cf7d72b
3 changed files with 755 additions and 1 deletions
--- a/nifi-docs/src/main/asciidoc/getting-started.adoc
+++ b/nifi-docs/src/main/asciidoc/getting-started.adoc
@ -0,0 +1,754 @@
+//
+// Licensed to the Apache Software Foundation (ASF) under one or more
+// contributor license agreements.  See the NOTICE file distributed with
+// this work for additional information regarding copyright ownership.
+// The ASF licenses this file to You under the Apache License, Version 2.0
+// (the "License"); you may not use this file except in compliance with
+// the License.  You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+//
+Getting Started with Apache NiFi
+================================
+Apache NiFi Team <dev@nifi.apache.org>
+:homepage: http://nifi.apache.org
+
+
+Who is This Guide For?
+----------------------
+
+This guide is written for user who have never used, have had limited exposure to, or only accomplished specific tasks within NiFi.
+This guide is not intended to be an exhaustive instruction manual or a reference guide. The
+link:user-guide.html[User Guide] provides a great deal of information
+and is intended to be a much more exhaustive resource and is very useful as a reference guide, as well.
+This guide, in comparison, is intended to provide users with just the information needed in order
+to understand how to work with NiFi in order to quickly and easily build powerful and agile dataflows.
+
+Because some of the information in this guide is applicable only for first-time users while other
+information may be applicable for those who have used NiFi a bit, this guide is broken up into
+several different sections, some of which may not be useful for some readers. Feel free to jump to
+the sections that are most appropriate for you.
+
+This guide does expect that the user has a basic understanding of what NiFi is and does not
+delve into this level of detail. This level of information can be found in the
+link:overview.html[Overview] documentation.
+
+
+
+
+Terminology Used in This Guide
+------------------------------
+
+In order to talk about NiFi, there are a few key terms that readers should be familiar with.
+We will explain those NiFi-specific terms here, at a high level.
+
+
+*FlowFile*: Each piece of "User Data" (i.e., data that the user brings into NiFi for processing and distribution) is
+referred to as a FlowFile. A FlowFile is made up of two parts: Attributes and Content. The Content is the User Data
+itself. Attributes are key-value pairs that are associated with the User Data.
+
+*Processor*: The Processor is the NiFi component that is responsible for creating, sending, receiving, transforming, routing,
+splitting, merging, and processing FlowFiles. It is the most important building block available to NiFi users to build their
+dataflows.
+
+
+Downloading and Installing NiFi
+-------------------------------
+
+NiFi can be downloaded from the link:http://nifi.apache.org/download.html[NiFi Downloads Page]. There are two packaging options
+available: a "tarball" that is tailored more to Linux and a zip file that is more applicable for Windows users. Mac OSX users
+may also use the tarball or can install via Homebrew.
+
+To install via Homebrew, simply run the command `brew install nifi`.
+
+For users who are not running OSX or do not have Homebrew installed, after downloading the version of NiFi that you
+would like to use simply extract the archive to the location that you wish to run the application from.
+
+For information on how to configure the instance of NiFi (for instance, to configure security, data storage
+configuration, or the port that NiFi is running on), see the link:administration-guide.html[Admin Guide].
+
+
+Starting NiFi
+-------------
+
+Once NiFi has been downloaded and installed as described above, it can be started by using the mechanism
+appropriate for your operating system.
+
+=== For Windows Users
+
+For Windows users, navigate to the folder where NiFi was installed. Within this folder is a subfolder
+named `bin`. Navigate to this subfolder and double-click the `run-nifi.bat` file.
+
+This will launch NiFi and leave it running in the foreground. To shut down NiFi, select the window that
+was launched and hold the Ctrl key while pressing C.
+
+
+=== For Linux/Mac OSX users
+
+For Linux and OSX users, use a Terminal window to navigate to the directory where NiFi was installed.
+To run NiFi in the foreground, run `bin/nifi.sh run`. This will leave the application running until
+the user presses Ctrl-C. At that time, it will initiate shutdown of the application.
+
+To run NiFi in the background, instead run `bin/nifi.sh start`. This will initiate the application to
+begin running. To check the status and see if NiFi is currently running, execute the command `bin/nifi.sh status`.
+NiFi can be shutdown by executing the command `bin/nifi.sh stop`.
+
+
+=== Installing as a Service
+
+Currently, installing NiFi as a service is supported only for Linux and Mac OSX users. To install the application
+as a service, navigate to the installation directory in a Terminal window and execute the command `bin/nifi.sh install`
+to install the service with the default name `nifi`. To specify a custom name for the service, execute the command
+with an optional second argument that is the name of the service. For example, to install NiFi as a service with the
+name `dataflow`, use the command `bin/nifi.sh install dataflow`.
+
+Once installed, the service can be started and stopped using the appropriate commands, such as `sudo service nifi start`
+and `sudo service nifi stop`. Additionally, the running status can be checked via `sudo service nifi status`.
+
+
+
+I Started NiFi. Now What?
+-------------------------
+
+Now that NiFi has been started, we can bring up the User Interface (UI) in order to create and monitor our dataflow.
+To get started, open a web browser and navigate to `http://localhost:8080/nifi`. The port can be changed by
+editing the `nifi.properties` file in the NiFi `conf` directory, but the default port is 8080.
+
+This will bring up the User Interface, which at this point is a blank canvas for orchestrating a dataflow:
+
+image:new-flow.png["New Flow"]
+
+Near the top of the UI are a few toolbars that will be very important to create your first dataflow:
+
+image:nifi-toolbar-components.png["Toolbar Components"]
+
+
+=== Adding a Processor
+
+We can now begin creating our dataflow by adding a Processor to our canvas. To do this, drag the Processor icon
+image:iconProcessor.png["Processor"] from the top-left of the screen into the middle of the canvas (the graph paper-like
+background) and drop it there. This will give us a dialog that allows us to choose which Processor we want to add:
+
+image:add-processor.png["Add Processor"]
+
+We have quite a few options to choose from. For the sake of becoming oriented with the system, let's say that we
+just want to bring in files from our local disk into NiFi. When a developer creates a Processor, the developer can 
+assign "tags" to that Processor. These can be thought of as keywords. We can filter by these tags or by Processor
+name by typing into the Filter box in the top-right corner of the dialog. Type in the keywords that you would think
+of when wanting to ingest files from a local disk. Typing in keyword "file", for instance, will provide us a few
+different Processors that deal with files. Filtering by the term "local" will narrow down the list pretty quickly,
+as well. If we select a Processor from the list,
+we will see a brief description of the Processor near the bottom of the dialog. This should tell us exactly what
+the Processor does. The description of the *GetFile* Processor tells us that it pulls data from our local disk
+into NiFi and then removes the local file. We can then double-click the Processor type or select it and choose the
+`Add` button. The Processor will be added to the canvas in the location that it was dropped.
+
+=== Configuring a Processor
+
+Now that we have added the GetFile Processor, we can configure it by right-clicking on the Processor and choosing
+the `Configure` menu item. The provided dialog allows us to configure many different options that can be read about
+in the link:user-guide.html[User Guide], but for the sake of this guide, we will focus on the Properties tab. Once
+the Properties tab has been selected, we are given a list of several different properties that we can configure
+for the Processor. The properties that are available depend on the type of Processor and are generally different
+for each type. Properties that are in bold are required properties. The Processor cannot be started until all required
+properties have been configured. The most important property to configure for GetFile is the directory from which
+to pick up files. If we set the directory name to `./data-in`, this will cause the Processor to start picking up
+any data in the `data-in` subdirectory of the NiFi Home directory. We can choose to configure several different
+Properties for this Processor. If unsure what a particular Property does, we can hover over the help icon (
+image:iconInfo.png["Help"]
+)
+next to the Property Name with the mouse in order to read a description of the property. Additionally, the
+tooltip that is displayed when hovering over the help icon will provide the default value for that property,
+if one exists, information about whether or not the property supports the Expression Language (see the
+<<ExpressionLanguage>> section below), and previously configured values for that property.
+
+In order for this property to be valid, create a directory named `data-in` in the NiFi home directory and then
+click the `OK` button to close the dialog.
+
+
+=== Connecting Processors
+
+Each Processor has a set of defined "Relationships" that it is able to send data to. When a Processor finishes handling
+a FlowFile, it transfers it to one of these Relationships. This allows a user to configure how to handle FlowFiles based
+on the result of Processing. For example, many Processor define two Relationships: `success` and `failure`. Users are
+then able to configure data to be routed through the flow one way if the Processor is able to successfully process
+the data and route the data through the flow in a completely different many if the Processor cannot process the
+data for some reason. Or, depending on the use case, the may simply route both relationships to the same route through
+the flow.
+
+Now that we have added and configured our GetFile processor and applied the configuration, we can see in the
+top-left corner of the Processor an Alert icon (
+image:iconAlert.png[Alert]
+) signaling that the Processor is not in a valid state. Hovering over this icon, we can see that the `success`
+relationship has not been defined. This simply means that we have not told NiFi what to do with the data that the Processor
+transfers to the `success` Relationship.
+
+In order to address this, let's add another Processor that we can connect the GetFile Processor to, by following
+the same steps above. This time, however, we will simply log the attributes that exist for the FlowFile. To do this,
+we will add a LogAttributes Processor.
+
+We can now send the output of the GetFile Processor to the LogAttribute Processor. Hover over the GetFile Processor
+with the mouse and a Connection Icon (
+image:iconConnection.png[Connection]
+) will appear over the middle of the Processor. We can drag this icon from the GetFile Processor to the LogAttribute
+Processor. This gives us a dialog to choose which Relationships we want to include for this connection. Because GetFile
+has only a single Relationship, `success`, it is automatically selected for us.
+
+Clicking on the Settings tab provides a handful of options for configuring how this Connection should behave:
+
+image:connection-settings.png[Connection Settings]
+
+We can give the Connection a name, if we like. Otherwise, the Connection name will be based on the selected Relationships.
+We can also set an expiration for the data. By default, it is set to "0 sec" which indicates that the data should not
+expire. However, we can change the value so that when data in this Connection reaches a certain age, it will automatically
+be deleted (and a corresponding EXPIRE Provenance event will be created).
+
+The backpressure thresholds allow us to specify how full the queue is allowed to become before the source Processor is
+no longer scheduled to run. This allows us to handle cases where one Processor is capable of producing data faster than
+the next Processor is capable of consuming that data. If the backpressure is configured for each Connection along the way,
+the Processor that is bringing data into the system will eventually experience the backpressure and stop bringing in new
+data so that our system has the ability to recover.
+
+Finally, we have the Prioritizers on the right-hand side. This allows us to control how the data in this queue is ordered.
+We can drag Prioritizers from the "Available prioritizers" list to the "Selected prioritizers" list in order to active
+the prioritizer. If multiple prioritizers are activated, they will be evaluated such that the Prioritizer listed first
+will be evaluated first and if two FlowFiles are determined to be equal according to that Prioritizers, the second Prioritizer
+will be used.
+
+For the sake of this discuss, we can simply click `Add`. to add the Connection to our graph. We should now see that the Alert 
+icon has changed to a Stopped icon (
+image:iconStop.png[Stopped]
+). The LogAttribute Processor, however, is now invalid because its `success` Relationship has not been connected to
+anything. Let's address this by signaling that data that is routed to `success` by LogAttribute should be "Auto Terminated,"
+meaning that NiFi should consider the FlowFile's processing complete and "drop" the data. To do this, we configure the
+LogAttribute Processor. On the Settings tab, in the right-hand side we can check the box next to the `success` Relationship
+to Auto Terminate the data. Clicking `OK` will close the dialog and show that both Processors are now stopped.
+
+
+=== Starting and Stopping Processors
+
+At this point, we have two Processors on our graph, but nothing is happening. In order to start the Processors, we can
+click on each one individually and then right-click and choose the `Start` menu item. Alternatively, we can select the first
+Processor, and then hold the Shift key while selecting the other Processor in order to select both. Then, we can
+right-click and choose the `Start` menu item. As an alternative to using the context menu, we can select the Processors and
+then click the Start icon in the toolbar at the top of the screen.
+
+Once started, the icon in the top-left corner of the Processors will change from a stopped icon to a Running icon. We can then
+stop the Processors in the same manner that we started them but using the Stop icon in the toolbar or the Stop menu item
+as opposed to the Start button.
+
+Once a Processor has started, we are not able to configure it anymore. Instead, when we right-click on the Processor, we are
+given the option to view its current configuration. In order to configure a Processor, we must first stop the Processor and
+wait for any tasks that may be executing to finish. The number of tasks currently executing is shown in the top-right
+corner of the Processor, but nothing is shown there if there are currently no tasks.
+
+
+=== Getting More Info for a Processor
+
+With each Processor having the ability to expose multiple different Properties and Relationships, it can become quite
+difficult to remember how all of the different pieces work for each Processor. To address this, you are able to right-click
+on a Processor and choose the `Usage` menu item. This will provide you with the Processor's usage information, such as a
+description of the Processor, the different Relationships that are available, when the different Relationships are used,
+Properties that are exposed by the Processor and their documentation, as well as which FlowFile Attributes (if any) are
+expected on incoming FlowFiles and which Attributes (if any) are added to outgoing FlowFiles.
+
+
+=== Other Components
+
+The toolbar that provides users the ability to drag and drop Processors onto the graph includes several other components
+that can be used to build a dataflow. These components include Input and Output Ports, Funnels, Process Groups, and Remote
+Process Groups. Due to the intended scope of this document, we will not discuss these elements here, but information is
+readily available in the link:user-guide.html#building-dataflow[Building a Dataflow section] of the
+link:user-guide.html[User Guide].
+
+
+
+What Processors are Available
+-----------------------------
+
+In order to create an effective dataflow, the users must understand what types of Processors are available to them.
+NiFi contains many different Processors out of the box. These Processors provide capabilities to ingest data from
+numerous different systems, route, transform, process, split, and aggregate data, and distribute data to many systems.
+
+The number of Processors that are available increases in nearly each release of NiFi. As a result, we will not attempt
+to name each of the Processors that are available, but we will highlight some of the most frequently used Processors,
+categorizing them by their functions.
+
+=== Data Transformation
+- *CompressContent*: Compress or Decompress Content
+- *ConvertCharacterSet*: Convert the character set used to encode the content from one character set to another
+- *EncryptContent*: Encrypt or Decrypt Content
+- *ReplaceText*: Use Regular Expressions to modify textual Content
+- *TransformXml*: Apply an XSLT transform to XML Content
+
+=== Routing and Mediation
+- *ControlRate*: Throttle the rate at which data can flow through one part of the flow
+- *DetectDuplicate*: Monitor for duplicate FlowFiles, based on some user-defined criteria. Often used in conjunction
+	with HashContent
+- *DistributeLoad*: Load balance or sample data by distributing only a portion of data to each user-defined Relationship
+- *MonitorActivity*: Sends a notification when a user-defined period of time elapses without any data coming through a particular
+	point in the flow. Optionally send a notification when dataflow resumes.
+- *RouteOnAttribute*: Route FlowFile based on the attributes that it contains.
+- *ScanAttribute*: Scans the user-defined set of Attributes on a FlowFile, checking to see if any of the Attributes match the terms
+  found in a user-defined dictionary.
+- *RouteOnContent*: Search Content of a FlowFile to see if it matches any user-defined Regular Expression. If so, the FlowFile is
+  routed to the configured Relationship.
+- *ScanContent*: Search Content of a FlowFile for terms that are present in a user-defined dictionary and route based on the
+	presence or absence of those terms. The dictionary can consist of either textual entries or binary entries.
+- *ValidateXml*: Validation XML Content against an XML Schema; routes FlowFile based on whether or not the Content of the FlowFile
+	is valid according to the user-defined XML Schema.
+
+=== Database Access
+- *ConvertJSONToSQL*: Convert a JSON document into a SQL INSERT or UPDATE command that can then be passed to the PutSQL Processor
+- *ExecuteSQL*: Executes a user-defined SQL SELECT command, writing the results to a FlowFile in Avro format
+- *PutSQL*: Updates a database by executing the SQL DDM statement defined by the FlowFile's content
+
+[[AttributeExtraction]]
+=== Attribute Extraction
+- *EvaluateJsonPath*: User supplies JSONPath Expressions (Similar to XPath, which is used for XML parsing/extraction), and these Expressions
+  are then evaluated against the JSON Content to either replace the FlowFile Content or extract the value into the user-named Attribute.
+- *EvaluateXPath*: User supplies XPath Expressions, and these Expressions are then evaluated against the XML Content to either
+  replace the FlowFile Content or extract the value into the user-named Attribute.
+- *EvaluateXQuery*: User supplies an XQuery query, and this query is then evaluated against the XML Content to either replace the FlowFile
+  Content or extract the value into the user-named Attribute.
+- *ExtractText*: User supplies one or more Regular Expressions that are then evaluated against the textual content of the FlowFile, and the
+  values that are extracted are then added as user-named Attributes.
+- *HashAttribute*: Performs a hashing function against the concatenation of a user-defined list of existing Attributes.
+- *HashContent*: Performs a hashing function against the content of a FlowFile and adds the hash value as an Attribute.
+- *IdentifyMimeType*: Evaluates the content of a FlowFile in order to determine what type of file the FlowFile encapsulates. This Processor is
+  capable of detecting many different MIME Types, such as images, word processor documents, text, and compression formats just to name
+	a few.
+- *UpdateAttribute*: Adds or updates any number of user-defined Attributes to a FlowFile. This is useful for adding statically configured values,
+  as well as deriving Attribute values dynamically by using the Expression Language. This processor also provides an "Advanced User Interface,"
+	allowing users to update Attributes conditionally, based on user-supplied rules.
+
+=== System Interaction
+- *ExecuteProcess*: Runs the user-defined Operating System command. The Process's StdOut is redirected such that the content that is written
+  to StdOut becomes the content of the outbound FlowFile. This Processor is a Source Processor - its output is expected to generate a new FlowFile,
+	and the system call is expected to receive no input. In order to provide input to the process, use the ExecuteStreamCommand Processor.
+
+- *ExecuteStreamCommand*: Runs the user-defined Operating System command. The contents of the FlowFile are optionally streamed to the StdIn
+  of the process. The content that is written to StdOut becomes the content of hte outbound FlowFile. This Processor cannot be used a Source Processor -
+	it must be fed incoming FlowFiles in order to perform its work. To perform the same type of functionality with a Source Processor, see the
+	ExecuteProcess Processor.
+
+=== Data Ingestion
+- *GetFile*: Streams the contents of a file from a local disk (or network-attached disk) into NiFi and then deletes the original file. This
+  Processor is expected to move the file from one location to another location and is not to be used for copying the data.
+- *GetFTP*: Downloads the contents of a remote file via FTP into NiFi and then deletes the original file. This Processor is expected to move
+  the data from one location to another location and is not to be used for copying the data.
+- *GetSFTP*: Downloads the contents of a remote file via SFTP into NiFi and then deletes the original file. This Processor is expected to move
+  the data from one location to another location and is not to be used for copying the data.
+- *GetJMSQueue*: Downloads a message from a JMS Queue and creates a FlowFile based on the contents of the JMS message. The JMS Properties are
+  optionally copied over as Attributes, as well.
+- *GetJMSTopic*: Downloads a message from a JMS Topic and creates a FlowFile based on the contents of the JMS message. The JMS Properties are
+  optionally copied over as Attributes, as well. This Processor supports both durable and non-durable subscriptions.
+- *GetHTTP*: Downloads the contents of a remote HTTP- or HTTPS-based URL into NiFi. The Processor will remember the ETag and Last-Modified Date
+  in order to ensure that the data is not continually ingested.
+- *ListenHTTP*: Starts an HTTP (or HTTPS) Server and listens for incoming connections. For any incoming POST request, the contents of the request
+  are written out as a FlowFile, and a 200 response is returned.
+- *ListenUDP*: Listens for incoming UDP packets and creates a FlowFile per packet or per bundle of packets (depending on configuration) and
+  emits the FlowFile to the 'success' relationship.
+- *GetHDFS*: Monitors a user-specified directory in HDFS. Whenever a new file enters HDFS, it is copied into NiFi and deleted from HDFS. This
+  Processor is expected to move the file from one location to another location and is not to be used for copying the data. This Processor is also
+	expected to be run On Primary Node only, if run within a cluster. In order to copy the data from HDFS and leave it in-tact, or to stream the data
+	from multiple nodes in the cluster, see the ListHDFS Processor.
+- *ListHDFS* / *FetchHDFS*: ListHDFS monitors a user-specified directory in HDFS and emits a FlowFile containing the filename for each file that it
+  encounters. It then persists this state across the entire NiFi cluster by way of a Distributed Cache. These FlowFiles can then be fanned out across
+	the cluster and sent to the FetchHDFS Processor, which is responsible for fetching the actual content of those files and emitting FlowFiles that contain
+	the content fetched from HDFS.
+- *FetchS3Object*: Fetches the contents of an object from the Amazon Web Services (AWS) Simple Storage Service (S3). The outbound FlowFile contains the contents
+  received from S3.
+- *GetKafka*: Consumes messages from Apache Kafka. The messages can be emitted as a FlowFile per message or can be batched together using a user-specified
+  delimiter.
+- *GetMongo*: Executes a user-specified query against MongoDB and writes the contents to a new FlowFile.
+- *GetTwitter*: Allows Users to register a filter to listen to the Twitter "garden hose" or Enterprise endpoint, creating a FlowFile for each tweet
+  that is received.
+
+=== Data Egress / Sending Data
+- *PutEmail*: Sends an E-mail to the configured recipients. The content of the FlowFile is optionally sent as an attachment.
+- *PutFile*: Writes the contents of a FlowFile to a directory on the local (or network attached) file system.
+- *PutFTP*: Copies the contents of a FlowFile to a remote FTP Server.
+- *PutSFTP*: Copies the contents of a FlowFile to a remote SFTP Server.
+- *PutJMS*: Sends the contents of a FlowFile as a JMS message to a JMS broker, optionally adding JMS Properties based on Attributes.
+- *PutSQL*: Executes the contents of a FlowFile as a SQL DDL Statement (INSERT, UPDATE, or DELETE). The contents of the FlowFile must be a valid
+  SQL statement. Attributes can be used as parameters so that the contents of the FlowFile can be parameterized SQL statements in order to avoid
+	SQL injection attacks.
+- *PutKafka*: Sends the contents of a FlowFile to Kafka as a message. The FlowFile can be sent as a single message or a delimiter, such as a
+  new-line can be specified, in order to send many messages for a single FlowFile.
+- *PutMongo*: Sends the contents of a FlowFile to Mongo as an INSERT or an UPDATE.
+
+=== Splitting and Aggregation
+- *SplitText*: SplitText takes in a single FlowFile whose contents are textual and splits it into 1 or more FlowFiles based on the configured
+  number of lines. For example, the Processor can be configured to split a FlowFile into many FlowFiles, each of which is only 1 line.
+- *SplitJson*: Allows the user to split a JSON object that is comprised of an array or many child objects into a FlowFile per JSON element.
+- *SplitXml*: Allows the user to split an XML message into many FlowFiles, each containing a segment of the original. This is generally used when
+  several XML elements have been joined together with a "wrapper" element. This Processor then allows those elements to be split out into individual
+	XML elements.
+- *UnpackContent*: Unpacks different types of archive formats, such as ZIP and TAR. Each file within the archive is then transferred as a single
+  FlowFile.
+- *MergeContent*: This Processor is responsible for merging many FlowFiles into a single FlowFile. The FlowFiles can be merged by concatenating their
+  content together along with optional header, footer, and demarcator, or by specifying an archive format, such as ZIP or TAR. FlowFiles can be binned
+	together based on a common attribute, or can be "defragmented" if they were split apart by some other Splitting process. The minimum and maximum
+	size of each bin is user-specified, based on number of elements or total size of FlowFiles' contents, and an optional Timeout can be assigned as well
+	so that FlowFiles will only wait for their bin to become full for a certain amount of time.
+- *SegmentContent*: Segments a FlowFile into potentially many smaller FlowFiles based on some configured data size. The splitting is not performed
+  against any sort of demarcator but rather just based on byte offsets. This is used before transmitting FlowFiles in order to provide lower latency
+	by sending many different pieces in parallel. On the other side, these FlowFiles can then be reassembled by the MergeContent processor using the
+	Defragment mode.
+- *SplitContent*: Splits a single FlowFile into potentially many FlowFiles, similarly to SegmentContent. However, with SplitContent, the splitting
+  is not performed on arbitrary byte boundaries but rather a byte sequence is specified on which to split the content.
+
+=== HTTP
+- *GetHTTP*: Downloads the contents of a remote HTTP- or HTTPS-based URL into NiFi. The Processor will remember the ETag and Last-Modified Date
+  in order to ensure that the data is not continually ingested.
+- *ListenHTTP*: Starts an HTTP (or HTTPS) Server and listens for incoming connections. For any incoming POST request, the contents of the request
+  are written out as a FlowFile, and a 200 response is returned.
+- *InvokeHTTP*: Performs an HTTP Request that is configured by the user. This Processor is much more versatile than the GetHTTP and PostHTTP
+  but requires a bit more configuration. This Processor cannot be used as a Source Processor and is required to have incoming FlowFiles in order
+	to be triggered to perform its task.
+- *PostHTTP*: Performs an HTTP POST request, sending the contents of the FlowFile as the body of the message. This is often used in conjunction
+  with ListenHTTP in order to transfer data between two different instances of NiFi in cases where Site-to-Site cannot be used (for instance,
+	when the nodes cannot access each other directly and are able to communicate through an HTTP proxy).
+- *HandleHttpRequest* / *HandleHttpResponse*: The HandleHttpRequest Processor is a Source Processor that starts an embedded HTTP(S) server
+  similarly to ListenHTTP. However, it does not send a response to the client. Instead, the FlowFile is sent out with the body of the HTTP request
+	as its contents and attributes for all of the typical Servlet parameters, headers, etc. as Attributes. The HandleHttpResponse then is able to
+	send a response back to the client after the FlowFile has finished being processed. These Processors are always expected to be used in conjunction
+	with one another and allow the user to visually create a Web Service within NiFi. This is particularly useful for adding a front-end to a non-web-
+  based protocol or to add a simple web service around some functionality that is already performed by NiFi, such as data format conversion.
+
+=== Amazon Web Services
+- *FetchS3Object*: Fetches the content of an object stored in Amazon Simple Storage Service (S3). The content that is retrieved from S3
+  is then written to the content of the FlowFile.
+- *PutS3Object*: Writes the contents of a FlowFile to an Amazon S3 object using the configured credentials, key, and bucket name.
+- *PutSNS*: Sends the contents of a FlowFile as a notification to the Amazon Simple Notification Service (SNS).
+- *GetSQS*: Pulls a message from the Amazon Simple Queuing Service (SQS) and writes the contents of the message to the content of the FlowFile.
+- *PutSQS*: Sends the contents of a FlowFile as a message to the Amazon Simple Queuing Service (SQS).
+- *DeleteSQS*: Deletes a message from the Amazon Simple Queuing Service (SQS). This can be used in conjunction with the GetSQS in order to receive
+  a message from SQS, perform some processing on it, and then delete the object from the queue only after it has successfully completed processing.
+
+
+Working With Attributes
+-----------------------
+Each FlowFile is created with several Attributes, and these Attributes will change over the life of
+the FlowFile. The concept of a FlowFile is extremely powerful and provides three primary benefits.
+First, it allows the user to make routing decisions in the flow so that FlowFiles that meeting some criteria
+can be handled differently than other FlowFiles. This is done using the RouteOnAttribute and similar Processors.
+
+Secondly, Attributes are used in order to configure Processors in such a way that the configurationg of the
+Processor is dependent on the data itself. For instance, the PutFile Processor is able to use the Attributes in order
+to know where to store each FlowFile, while the directory and filename Attributes may be different for each FlowFile.
+
+Finally, the Attributes provide extremely valuable context about the data. This is useful when reviewing the Provenance
+data for a FlowFile. This allows the user to search for Provenance data that match specific criteria, and it also allows
+the user to view this context when inspecting the details of a Provenance Event. By doing this, the user is then able
+to gain valuable insight as to why the data was processed one way or another, simply by glancing at this context that is
+carried along with the content.
+
+=== Common Attributes
+
+Each FlowFile has a minimum set of Attributes:
+
+	- *filename*: A filename that can be used to store the data to a local or remote file system
+	- *path*: The name of a directory that can be used to store the data to a local or remote file system
+	- *uuid*: A Universally Unique Identifier that distinguishes the FlowFile from other FlowFiles in the system.
+	- *entryDate*: The date and time at which the FlowFile entered the system (i.e., was created). The value of this
+		attribute is a number that represents the number of milliseconds since midnight, Jan. 1, 1970 (UTC).
+	- *lineageStartDate*: Any time that a FlowFile is cloned, merged, or split, this results in a "child" FlowFile being
+		created. As those children are then cloned, merged, or split, a chain of ancestors is built. This value represents
+		the date and time at which the oldest ancestor entered the system. Another way to think about this is that this
+		attribute represents the latency of the FlowFile through the system. The value is a number that represents the number
+		of milliseconds since midnight, Jan. 1, 1970 (UTC).
+	- *fileSize*: This attribute represents the number of bytes taken up by the FlowFile's Content.
+
+Note that the `uuid`, `entryDate`, `lineageStartDate`, and `fileSize` attributes are system-generated and cannot be changed.
+
+=== Extracting Attributes
+
+NiFi provides several different Processors out of the box for extracting Attributes from FlowFiles. A list of commonly used
+Processors for this purpose can be found above in the <<AttributeExtraction>> section. This is a very common use case for building
+custom Processors, as well. Many Processors are written to understand a specific data format and extract pertinent information from
+a FlowFile's content, creating Attributes to hold that information, so that decisions can then be made about how to route or
+process the data.
+
+=== Adding User-Defined Attributes
+
+In addition to having Processors that are able to extract particular pieces of information from FlowFile content into Attributes,
+it is also common for users to want to add their own user-defined Attributes to each FlowFile at a particular place in the flow.
+The UpdateAttribute Processor is designed specifically for this purpose. Users are able to add a new property to the Processor
+in the Configure dialog by clicking the "New Property" button in the top-right corner of the Properties tab. The user is then
+prompted to enter the name of the property and then a value. For each FlowFile that is processed by this UpdateAttribute
+Processor, an Attribute will be added for each user-defined property. The name of the Attribute will be the same as the name of
+the property that was added. The value of the Attribute will be the same as the value of the property.
+
+The value of the property may contain the Expression Language, as well. This allows Attributes to be modified or added
+based on other Attributes. For example, if we want to prepend the hostname that is processing a file as well as the date to
+a filename, we could do this by adding a property with the name `filename` and the value `${hostname()}-${now():format('yyyy-dd-MM')}-${filename}`.
+While this may seem confusing at first, the section below on <<ExpressionLanguage>> will help to clear up what is going on
+here.
+
+In addition to always adding a defined set of Attributes, the UpdateAttribute Processor has an Advanced UI that allows the user
+to configure a set of rules for which Attributes should be added when. To access this capability, in the Configure dialog's
+Properties tab, click the `Advanced...` button at the bottom of the dialog. This will provide a UI that is tailored specifically
+to this Processor, rather than the simple Properties table that is provided for all Processors. Within this UI, the user is able
+to configure a rules engine, essentially, specifying rules that must match in order to have the configured Attributes added
+to the FlowFile.
+
+=== Routing on Attributes
+
+One of the most powerful features of NiFi is the ability to route FlowFiles based on their Attributes. The primary mechanism
+for doing this is the RouteOnAttribute Processor. This Processor, like UpdateAttribute, is configured by adding user-defined properties.
+Any number of properties can be added by clicking the "New Property" icon in the top-right corner of the Properties tab in the
+Processor's Configure dialog.
+
+Each FlowFile's Attributes will be compared against the configured properties to determine whether or not the FlowFile meets the
+specified criteria. The value of each property is expected to be an Expression Language expression and return a boolean value.
+For more on the Expression Language, see the <<ExpressionLanguage>> section below.
+
+After evaluating the Expression Language expressions provided against the FlowFile's Attributes, the Processor determines how to
+route the FlowFile based on the Routing Strategy selected. The most common strategy is the "Route to Property name" strategy. With this
+strategy selected, the Processor will expose a Relationship for each property configured. If the FlowFile's Attributes satisfy the given
+expression, a copy of the FlowFile will be routed to the corresponding Relationship. For example, if we had a new property with the name
+"begins-with-r" and the value "${filename:startsWith(\'r')}" then any FlowFile whose filename starts with the letter 'r' will be routed
+to that Relationship. All other FlowFiles will be routed to 'unmatched'.
+
+
+[[ExpressionLanguage]]
+=== Expression Language / Using Attributes in Property Values
+
+As we extract Attributes from FlowFiles' contents and add user-defined Attributes, they don't do us much good as an operator unless
+we have some mechanism by which we can use them. The NiFi Expression Language allows us to access and manipulate FlowFile Attribute
+values as we configure our flows. Not all Processor properties allow the Expression Language to be used, but many do. In order to
+determine whether or not a property supports the Expression Language, a user can hover over the Help icon (
+icon:iconInfo.png["Help Icon"]
+) in the Properties tab of the Processor Configure dialog. This will provide a tooltip that shows a description of the property, the
+default value, if any, and whether or not the property supports the Expression Language.
+
+For properties that do support the Expression Language, it is used by adding an expression within the opening `${` tag and the closing
+`}` tag. An expression can be as simple as an attribute name. For example, to reference the `uuid` Attribute, we can simply use the
+value `${uuid}`. If the Attribute name begins with any character other than a letter, or if it contians a character other than
+a number, a letter, a period (.), or an underscore (_), the Attribute name will need to be quoted. For example, `${My Attribute Name}`
+will be invalid, but `${'My Attribute Name'}` will refer to the Attribute `My Attribute Name`.
+
+In addition to referencing Attribute values, we can perform a number of functions and comparisons on those Attributes. For example,
+if we want to check if the `filename` attribute contains the letter 'r' without paying attention to case (upper case or lower case),
+we can do this by using the expression `${filename:toLower():contains('r')}`. Note here that the functions are separated by colons.
+We can chain together any number of functions to build up more complex expressions. It is also important to understand here that even
+though we are calling `filename:toLower()`, this does not alter the value of the `filename` Attribute in anyway but rather just gives
+us a new value to work with.
+
+We can also embed one expression within another. For example, if we wanted to compare the value of the `attr1` Attribute to
+the value of the `attr2` Attribute, we can do this with the following expression: `${attr1:equals( ${attr2} )}`.
+
+The Expression Language contains many different functions that can be used in order to perform the tasks needed for routing and manipulating
+Attributes. Functions exist for parsing and manipulating strings, comparing string and numeric values, manipulating and replacing values,
+and comparing values. A full explanation of the different functions available is out of scope of this document, but the
+link:expression-language-guide.html[Expression Language Guide] provides far greater detail for each of the functions.
+
+In addition, this Expression Language guide is built in to the application so that users are able to easily see which functions are available
+and see their documentation while typing. When setting the value of a property that supports the Expression Language, if the cursor is within
+the Expression Language start and end tags, pressing Ctrl + Space on the keyword will provide a popup of all of the available functions and
+will provide auto-complete functionality. Clicking on or using the keyboard to navigate to one of the functions listed in the popup will
+cause a tooltip to show, which explains what the function does, the arguments that it expects, and the return type of the function.
+
+
+Working With Templates
+----------------------
+
+As we use Processors to build more and more complex dataflows in NiFi, we often will find that we string together the same sequence
+of Processors to perform some task. This can become tedious and inefficient. To address this, NiFi provides a concept of Templates.
+A template can be thought of as a reusable sub-flow. To create a template, follow these steps:
+
+- Select the components to include in the template. We can select multiple components by clicking on the first component and then holding
+  the Shift key while selecting additional components (to include the Connections between those components), or by holding the Shift key
+	while dragging a box around the desired components on the canvas.
+- Select the Create Template Icon (
+	image:iconTemplate.png[Template Icon]
+	) from the middle toolbar at the top of the screen.
+- Provide a name and optionally comments about the template.
+- Click the Create button.
+
+Once we have created a template, we can now use it as a building block in our flow, just as we would a Processor. To do this, we will
+click and drag the Template icon from the left-most toolbar onto our canvas. We can then choose the template that we would like to add
+or our canvas and click the Add button.
+
+Finally, we have the ability to manage our templates by using the Template Management dialog. To access this dialog, click the Template
+icon in the top-right toolbar. From here, we can see which templates exist and filter the templates to find the templates of interest.
+On the right-hand side of the table is icon to Export, or Download, the template as an XML file. This can then be provided to others so
+that they can use your template.
+
+To import a template into your NiFi instance, click the Browse button in the top-right corner of the dialog and navigate to the file on
+your computer. Then click the Import button. The template will now show up in your table, and you can drag it onto your canvas as you would
+any other template that you have created.
+
+There are a few important notes to remember when working with templates:
+- Any properties that are identified as being Sensitive Properties (such as a password that is configured in a Processor) will not be added
+  to the template. These sensitive properties will have to be populated each time that the template is added to the canvas.
+- If a component that is included in the template references a Controller Service, the Controller Service will also be added to the template.
+  This means that each time that the template is added to the graph, it will create a copy of the Controller Service.
+
+
+Monitoring NiFi
+---------------
+
+As data flows through your dataflow in NiFi, it is important to understand how well your system is performing in order to assess if you
+will require more resources and in order to assess the health of your current resources. NiFi provides a few mechanisms for monitoring
+your system.
+
+=== Status Bar
+
+Near the top of the NiFi screen is a blue bar that is referred to as the Status Bar. It contains a few important statistics about the current
+health of NiFi. The number of Active Threads can indicate how hard NiFi is currently working, and the Queued stat indicates how many FlowFiles
+are currently queued across the entire flow, as well as the total size of those FlowFiles.
+
+If the NiFi instance is in a cluster, we will also see an indicator here telling us how many nodes are in the cluster and how many are currently
+connected. In this case, the number of active threads and the queue size are indicative of all the sum of all nodes that are currently connected.
+
+=== Component Statistics
+
+Each Processor, Process Group, and Remote Process Group on the canvas provides several statistics about how much data has been processed
+by the component. These statistics provide information about how much data has been processed in the past five minutes. This is a rolling
+window and allows us to see things like the number of FlowFiles that have been consumed by a Processor, as well as the number of FlowFiles
+that have been emitted by the Processor.
+
+The connections between Processors also expose the number of items that are currently queued.
+
+It may also be valuable to see historical values for these metrics and, if clustered, how the different nodes compare to one another.
+In order to see this information, we can right-click on a component and choose the Status menu item. This will show us a graph that spans
+the time since NiFi was started, or up to 24 hours, whichever is less. The amount of time that is shown here can be extended or reduced
+by changing the configuration in the properties file.
+
+In the top-right corner is a drop-down that allows the user to select which metric they are viewing. The graph on the bottom allows the
+user to select a smaller portion of the graph to zoom in.
+
+
+=== Bulletins
+
+In addition to the statistics provided by each component, as a user we will want to know if any problems occur. While we could monitor the
+logs for anything interesting, it is much more convenient to have notifications pop up on the screen. If a Processor logs
+anything as a WARNING or ERROR, we will see a "Bulletin Indicator" show up in the top-left-hand corner of the Processor. This indicator
+looks like a sticky note and will be shown for five minutes after the event occurs. Hovering over the bulletin provides information about
+what happened so that the user does not have to sift through log messages to find it. If in a cluster, the bulletin will also indicate which
+node in the cluster emitted the bulletin. We can also change the log level at which bulletins will occur in the Settings tab of the Configure
+dialog for a Processor.
+
+If the framework emits a bulletin, we will also see this bulletin indicator occur in the Status Bar at the top of the screen.
+The right-most icon in the NiFi Toolbar is the Bulletin Board icon. Clicking this icon will take us to the bulletin board where
+we can see all bulletins that occur across the NiFi instance and can filter based on the component, the message, etc.
+
+
+Data Provenance
+---------------
+
+NiFi keeps a very granular level of detail about each piece of data that it ingests. As the data is processed through
+the system and is transformed, routed, split, aggregated, and distributed to other endpoints, this information is
+all stored within NiFi's Provenance Repository. In order to search and view this information, we can click the Data Provenance icon (
+image:iconProvenance.png[Data Provenance, width=28]) in the top-right corner of the canvas. This will provide us a table that lists
+the Provenance events that we have searched for:
+
+image:provenance-table.png[Provenance Table]
+
+Initially, this table is populated with the most recent 1,000 Provenance Events that have occurred (though it may take a few
+seconds for the information to be processed after the events occur). From this dialog, there is a Search button that allows the
+user to search for events that happened by a particular Processor, for a particular FlowFile by filename or UUID, or several other
+fields. The `nifi.properties` file provides the ability to configure which of these properties are indexed, or made searchable.
+Additionally, the properties file also allows you to choose specific FlowFile Attributes that will be indexed. As a result, you can
+choose which Attributes will be important to your specific dataflows and make those Attributes searchable.
+
+[[EventDetails]]
+=== Event Details
+Once we have performed our search, our table will be populated only with the events that match the search criteria. From here, we
+can choose the Info icon (
+image:iconInfo.png[Info Icon]
+) on the left-hand side of the table to view the details of that event:
+
+image:event-details.png[Event Details]
+
+From here, we can see exactly when that event occurred, which FlowFile the event affected, which component (Processor, etc.) performed the event,
+how long the event took, and the overall time that the data had been in NiFi when the event occurred (total latency).
+
+The next tab provides a listing of all Attributes that existed on the FlowFile at the time that the event occurred:
+
+image:event-attributes.png[Event Attributes]
+
+From here, we can see all the Attributes that existed on the FlowFile when the event occurred, as well as the previous values for those
+Attributes. This allows us to know which Attributes changed as a result of this event and how they changed. Additionally, in the right-hand
+corner is a checkbox that allows the user to see only those Attributes that changed. This may not be particularly useful if the FlowFile has
+only a handful of Attributes, but can be very helpful when a FlowFile has hundreds of Attributes.
+
+This is very important, because it allows the user to understand the exactly context in which the FlowFile was processed. This is very helpful
+to understand 'why' the FlowFile was processed the way that it was, especially when the Processor was configured using the Expression Language.
+
+Finally, we have the Content tab:
+
+image:event-content.png[Event Content]
+
+This tab provides us information about where in the Content Repository the FlowFile's content was stored. If the event modified the content
+of the FlowFile, we will see the 'before' and 'after' content claims. We are then given the option to Download the content or to View the
+content within NiFi itself, if the data format is one that NiFi understands how to render.
+
+Additionally, there is 'Replay' button that allows the user to re-insert the FlowFile into the flow and re-process it from exactly the point
+at which the event happened. This provides a very powerful mechanism, as we are able to modify our flow in real time, re-process a FlowFile,
+and then view the results. If they are not as expected, we can modify the flow again, and re-process the FlowFile again. We are able to perform
+this iterative development of the flow until it is processing the data exactly as intended.
+
+=== Lineage Graph
+
+In addition to viewing the details of a Provenance event, we can also view the lineage of the FlowFile involved by clicking on the Lineage Icon (
+image:iconLineage.png[Lineage]
+) from the table view.
+
+This provides us with a graphical representation of exactly what happened to that piece of data as it traversed the system:
+
+image:lineage-graph-annotated.png[Lineage Graph]
+
+From here, we can right-click on any of the events represented and click the "View Details" menu item to see the <<EventDetails>>.
+This graphical representation shows us exactly which events occurred to the data. There are a view "special" event types to be
+aware of. If we see a JOIN, FORK, or CLONE event, we can right-click and choose to Find Parents or Expand. This allows us to
+see the lineage of parent FlowFiles and children FlowFiles that were created as well.
+
+The slider in the bottom-left corner allows us to see the time at which these events occurred. By sliding it left and right, we can
+see which events introduced latency into the system so that we have a very good understanding of where in our system we may need to
+provide more resources, such as the number of Concurrent Tasks for a Processor. Or it may reveal, for example, that most of the latency
+was introduced by a JOIN event, in which we were waiting for more FlowFiles to join together. In either case, the ability to easily
+see where this is occurring is a very powerful feature that will help users to understand how the enterprise is operating.
+
+
+Where To Go For More Information
+--------------------------------
+
+The NiFi community has built up a significant amount of documentation on how to use the software. The following guides are available, in
+addition to this Getting Started Guide:
+
+- link:overview.html[Apache NiFi Overview] - Provides an overview of what Apache NiFi is, what it does, and why it was created.
+- link:user-guide.html[Apache NiFi User Guide] - A fairly extensive guide that is often used more as a Reference Guide, as it has pretty
+  lengthy in discussing all of the different components that comprise the application. This guide is written with the NiFi Operator as its
+	audience. It provides information on each of the different components available in NiFi and explains how to use the different features
+	provided by the application.
+- link:administration-guide.html[Administration Guide] - A guide for setting up and administering Apache NiFi for production environments.
+  This guide provides information about the different system-level settings, such as setting up clusters of NiFi and securing access to the
+	web UI and data.
+- link:expression-language-guide.html[Express Language Guide] - A far more exhaustive guide for understanding the Expression Language than
+  is provided above. This guide is the definitive documentation for the NiFi Expression Language. It provides an introduction to the EL
+	and an explanation of each function, its arguments, and return types as well as providing examples.
+- link:developer-guide.html[Developer's Guide] - While not an exhaustive guide to All Things NiFi Development, this guide does provide a
+  comprehensive overview of the different API's available and how they should be used. In addition, it provides Best Practices for developing
+	NiFi components and common Processor idioms to help aid in understanding the logic behind many of the existing NiFi components.
+- link:https://cwiki.apache.org/confluence/display/NIFI/Contributor+Guide[Contributor's Guide] - A guide for explaining how to contribute
+  work back to the Apache NiFi community so that others can make use of it.
+
+Several blog postings have also been added to the Apache NiFi blog site:
+link:https://blogs.apache.org/nifi/[https://blogs.apache.org/nifi/]
+
+In addition to the blog and guides provided here, you can browse the different
+link:https://nifi.apache.org/mailing_lists.html[NiFi Mailing Lists] or send an e-mail to one of the mailing lists at
+link:mailto:users@nifi.apache.org[users@nifi.apache.org] or
+link:mailto:dev@nifi.apache.org[dev@nifi.apache.org].
+
+Many of the members of the NiFi community are also available on Twitter and actively monitor for tweets that mention @apachenifi.
--- a/nifi-docs/src/main/asciidoc/images/add-processor.png
+++ b/nifi-docs/src/main/asciidoc/images/add-processor.png
--- a/nifi-docs/src/main/asciidoc/user-guide.adoc
+++ b/nifi-docs/src/main/asciidoc/user-guide.adoc
@ -155,7 +155,7 @@ image::status-bar.png["NiFi Status Bar"]



-
+[[building-dataflow]]
 Building a DataFlow
 -------------------