From fa2a01c823fcfec89487e4435ca4ea47bc2d04cf Mon Sep 17 00:00:00 2001 From: Michael Moser Date: Fri, 11 Oct 2024 17:06:00 -0400 Subject: [PATCH] NIFI-13854 Updated Getting Started Guide for 2.0.0 (#9362) Signed-off-by: David Handermann --- .../src/main/asciidoc/getting-started.adoc | 146 +++++++----------- .../src/main/asciidoc/images/iconInfo2.png | Bin 0 -> 301 bytes .../src/main/asciidoc/images/iconKebab.png | Bin 0 -> 166 bytes 3 files changed, 58 insertions(+), 88 deletions(-) create mode 100644 nifi-docs/src/main/asciidoc/images/iconInfo2.png create mode 100644 nifi-docs/src/main/asciidoc/images/iconKebab.png diff --git a/nifi-docs/src/main/asciidoc/getting-started.adoc b/nifi-docs/src/main/asciidoc/getting-started.adoc index c0fc263807..8c63a854ea 100644 --- a/nifi-docs/src/main/asciidoc/getting-started.adoc +++ b/nifi-docs/src/main/asciidoc/getting-started.adoc @@ -16,7 +16,7 @@ // = Getting Started with Apache NiFi Apache NiFi Team -:homepage: http://nifi.apache.org +:homepage: https://nifi.apache.org :linkattrs: @@ -60,16 +60,8 @@ dataflows. WARNING: Before proceeding, check the Admin Guide to confirm you have the <> to install and run NiFi. -NiFi can be downloaded from the link:http://nifi.apache.org/download.html[NiFi Downloads page^]. There are two packaging options -available: - -- a "tarball" (tar.gz) that is tailored more to Linux -- a zip file that is more applicable for Windows users - -macOS users may also use the tarball or can install via link:https://brew.sh[Homebrew^] by simply running the command `brew install nifi` from the command line terminal. - -For users who are not running macOS or do not have Homebrew installed, after downloading the version of NiFi that you -would like to use, simply extract the archive to the location that you wish to run the application from. +NiFi can be downloaded from the link:https://nifi.apache.org/download/[NiFi Downloads page^]. After downloading the version of NiFi +that you would like to use, simply extract the zip archive to the location that you wish to run the application from. For information on how to configure the instance of NiFi (for example, to configure security, data storage configuration, or the port that NiFi is running on), see the link:administration-guide.html[Admin Guide]. @@ -83,7 +75,7 @@ appropriate for your operating system. === For Windows Users For Windows users, navigate to the folder where NiFi was installed. Within this folder is a subfolder -named `bin`. Navigate to this subfolder and run `nifi.cmd start` file. +named `bin`. Navigate to this subfolder and run `nifi.cmd start`. This will launch NiFi and leave it running in the foreground. To shut down NiFi, select the window that was launched and hold the Ctrl key while pressing C. @@ -104,18 +96,6 @@ be used. If NiFi was installed with Homebrew, run the commands `nifi start` or `nifi stop` from anywhere in your file system to start or stop NiFi. -=== Installing as a Service - -Currently, installing NiFi as a service is supported only for Linux and macOS users. To install the application -as a service, navigate to the installation directory in a Terminal window and execute the command `bin/nifi.sh install` -to install the service with the default name `nifi`. To specify a custom name for the service, execute the command -with an optional second argument that is the name of the service. For example, to install NiFi as a service with the -name `dataflow`, use the command `bin/nifi.sh install dataflow`. - -Once installed, the service can be started and stopped using the appropriate commands, such as `sudo service nifi start` -and `sudo service nifi stop`. Additionally, the running status can be checked via `sudo service nifi status`. - - == I Started NiFi. Now What? @@ -188,14 +168,16 @@ for the Processor. The properties that are available depend on the type of Proce for each type. Properties that are in bold are required properties. The Processor cannot be started until all required properties have been configured. The most important property to configure for GetFile is the directory from which to pick up files. If we set the directory name to `./data-in`, this will cause the Processor to start picking up -any data in the `data-in` subdirectory of the NiFi Home directory. We can choose to configure several different -Properties for this Processor. If unsure what a particular Property does, we can hover over the Help icon ( -image:iconInfo.png["Help"] +any data in the `data-in` subdirectory of the NiFi Home directory. + +We can choose to configure several different Properties for this Processor. +If unsure what a particular Property does, we can hover over the Info icon ( +image:iconInfo2.png["Info"] ) next to the Property Name with the mouse in order to read a description of the property. Additionally, the -tooltip that is displayed when hovering over the Help icon will provide the default value for that property, -if one exists, information about whether or not the property supports the Expression Language (see the -<> section below), and previously configured values for that property. +tooltip that is displayed will provide the default value for that property if one exists, +information about whether the property supports the Expression Language (see the <> section below), +whether the property is sensitive and will be encrypted at rest, and history of previously configured values for that property. In order for this property to be valid, create a directory named `data-in` in the NiFi home directory and then click the `Ok` button to close the dialog. @@ -220,7 +202,7 @@ transfers to the `success` Relationship. In order to address this, let's add another Processor that we can connect the GetFile Processor to, by following the same steps above. This time, however, we will simply log the attributes that exist for the FlowFile. To do this, -we will add a LogAttributes Processor. +we will add a LogAttribute Processor. We can now send the output of the GetFile Processor to the LogAttribute Processor. Hover over the GetFile Processor with the mouse and a Connection Icon ( @@ -256,8 +238,8 @@ image:iconStop.png[Stopped] ). The LogAttribute Processor, however, is now invalid because its `success` Relationship has not been connected to anything. Let's address this by signaling that data that is routed to `success` by LogAttribute should be "Auto Terminated," meaning that NiFi should consider the FlowFile's processing complete and "drop" the data. To do this, we configure the -LogAttribute Processor. On the Settings tab, in the right-hand side we can check the box next to the `success` Relationship -to Auto Terminate the data. Clicking `OK` will close the dialog and show that both Processors are now stopped. +LogAttribute Processor. On the Relationships tab, we can check the `terminate` box next to the `success` Relationship +to Auto Terminate the data. Clicking the `Apply` button will close the dialog and show that both Processors are now stopped. === Starting and Stopping Processors @@ -281,10 +263,11 @@ corner of the Processor, but nothing is shown there if there are currently no ta With each Processor having the ability to expose multiple different Properties and Relationships, it can be challenging to remember how all of the different pieces work for each Processor. To address this, you are able to right-click -on a Processor and choose the `Usage` menu item. This will provide you with the Processor's usage information, such as a +on a Processor and choose the `View Documentation` menu item. This will provide you with the Processor's usage information, such as a description of the Processor, the different Relationships that are available, when the different Relationships are used, Properties that are exposed by the Processor and their documentation, as well as which FlowFile Attributes (if any) are -expected on incoming FlowFiles and which Attributes (if any) are added to outgoing FlowFiles. +expected on incoming FlowFiles and which Attributes (if any) are added to outgoing FlowFiles. Some processors also describe +specific configurations needed to accomplish use cases where they are commonly used. === Other Components @@ -310,7 +293,7 @@ categorizing them by their functions. === Data Transformation - *CompressContent*: Compress or Decompress Content - *ConvertCharacterSet*: Convert the character set used to encode the content from one character set to another -- *EncryptContent*: Encrypt or Decrypt Content +- *EncryptContentAge* / *EncryptContentPGP*: Encrypt or Decrypt Content - *ReplaceText*: Use Regular Expressions to modify textual Content - *TransformXml*: Apply an XSLT transform to XML Content - *JoltTransformJSON*: Apply a JOLT specification to transform JSON Content @@ -318,7 +301,7 @@ categorizing them by their functions. === Routing and Mediation - *ControlRate*: Throttle the rate at which data can flow through one part of the flow - *DetectDuplicate*: Monitor for duplicate FlowFiles, based on some user-defined criteria. Often used in conjunction - with HashContent + with CryptographicHashContent - *DistributeLoad*: Load balance or sample data by distributing only a portion of data to each user-defined Relationship - *MonitorActivity*: Sends a notification when a user-defined period of time elapses without any data coming through a particular point in the flow. Optionally send a notification when dataflow resumes. @@ -335,8 +318,6 @@ categorizing them by their functions. === Database Access - *ExecuteSQL*: Executes a user-defined SQL SELECT command, writing the results to a FlowFile in Avro format - *PutSQL*: Updates a database by executing the SQL DDM statement defined by the FlowFile's content -- *SelectHiveQL*: Executes a user-defined HiveQL SELECT command against an Apache Hive database, writing the results to a FlowFile in Avro or CSV format -- *PutHiveQL*: Updates a Hive database by executing the HiveQL DDM statement defined by the FlowFile's content [[AttributeExtraction]] === Attribute Extraction @@ -348,8 +329,7 @@ categorizing them by their functions. Content or extract the value into the user-named Attribute. - *ExtractText*: User supplies one or more Regular Expressions that are then evaluated against the textual content of the FlowFile, and the values that are extracted are then added as user-named Attributes. -- *HashAttribute*: Performs a hashing function against the concatenation of a user-defined list of existing Attributes. -- *HashContent*: Performs a hashing function against the content of a FlowFile and adds the hash value as an Attribute. +- *CryptographicHashContent*: Performs a hashing function against the content of a FlowFile and adds the hash value as an Attribute. - *IdentifyMimeType*: Evaluates the content of a FlowFile in order to determine what type of file the FlowFile encapsulates. This Processor is capable of detecting many different MIME Types, such as images, word processor documents, text, and compression formats just to name a few. @@ -374,12 +354,9 @@ categorizing them by their functions. the data from one location to another location and is not to be used for copying the data. - *GetSFTP*: Downloads the contents of a remote file via SFTP into NiFi and then deletes the original file. This Processor is expected to move the data from one location to another location and is not to be used for copying the data. -- *GetJMSQueue*: Downloads a message from a JMS Queue and creates a FlowFile based on the contents of the JMS message. The JMS Properties are - optionally copied over as Attributes, as well. -- *GetJMSTopic*: Downloads a message from a JMS Topic and creates a FlowFile based on the contents of the JMS message. The JMS Properties are - optionally copied over as Attributes, as well. This Processor supports both durable and non-durable subscriptions. -- *GetHTTP*: Downloads the contents of a remote HTTP- or HTTPS-based URL into NiFi. The Processor will remember the ETag and Last-Modified Date - in order to ensure that the data is not continually ingested. +- *ConsumeJMS*: Downloads a message from a JMS Queue or Topic and creates a FlowFile based on the contents of the JMS message. The JMS Properties are + optionally copied over as Attributes, as well. This Processor also supports durable topic subscriptions. +- *InvokeHTTP*: Can download data from a remote HTTP server. See the <> section below. - *ListenHTTP*: Starts an HTTP (or HTTPS) Server and listens for incoming connections. For any incoming POST request, the contents of the request are written out as a FlowFile, and a 200 response is returned. - *ListenUDP*: Listens for incoming UDP packets and creates a FlowFile per packet or per bundle of packets (depending on configuration) and @@ -387,16 +364,16 @@ categorizing them by their functions. - *GetHDFS*: Monitors a user-specified directory in HDFS. Whenever a new file enters HDFS, it is copied into NiFi and deleted from HDFS. This Processor is expected to move the file from one location to another location and is not to be used for copying the data. This Processor is also expected to be run On Primary Node only, if run within a cluster. In order to copy the data from HDFS and leave it in-tact, or to stream the data - from multiple nodes in the cluster, see the ListHDFS Processor. + from multiple nodes in the cluster, see the ListHDFS Processor. _HDFS components are available via NiFi plugin extension._ - *ListHDFS* / *FetchHDFS*: ListHDFS monitors a user-specified directory in HDFS and emits a FlowFile containing the filename for each file that it encounters. It then persists this state across the entire NiFi cluster by way of a Distributed Cache. These FlowFiles can then be fanned out across the cluster and sent to the FetchHDFS Processor, which is responsible for fetching the actual content of those files and emitting FlowFiles that contain - the content fetched from HDFS. + the content fetched from HDFS. _HDFS components are available via NiFi plugin extension._ - *FetchS3Object*: Fetches the contents of an object from the Amazon Web Services (AWS) Simple Storage Service (S3). The outbound FlowFile contains the contents received from S3. -- *GetKafka*: Fetches messages from Apache Kafka, specifically for 0.8.x versions. The messages can be emitted as a FlowFile per message or can be batched together using a user-specified delimiter. +- *ConsumeKafka*: Fetches messages from Apache Kafka. The messages can be emitted as a FlowFile per message or can be batched together using a user-specified delimiter. - *GetMongo*: Executes a user-specified query against MongoDB and writes the contents to a new FlowFile. -- *GetTwitter*: Allows Users to register a filter to listen to the Twitter "garden hose" or Enterprise endpoint, creating a FlowFile for each tweet +- *ConsumeTwitter*: Allows Users to register a filter to listen to the X/Twitter "garden hose" or Enterprise endpoint, creating a FlowFile for each post that is received. === Data Egress / Sending Data @@ -404,12 +381,13 @@ categorizing them by their functions. - *PutFile*: Writes the contents of a FlowFile to a directory on the local (or network attached) file system. - *PutFTP*: Copies the contents of a FlowFile to a remote FTP Server. - *PutSFTP*: Copies the contents of a FlowFile to a remote SFTP Server. -- *PutJMS*: Sends the contents of a FlowFile as a JMS message to a JMS broker, optionally adding JMS Properties based on Attributes. +- *InvokeHTTP*: Send the contents of a FlowFile to a remote HTTP server. See the <> section below. +- *PublishJMS*: Sends the contents of a FlowFile as a JMS message to a JMS broker, optionally adding JMS Properties based on Attributes. - *PutSQL*: Executes the contents of a FlowFile as a SQL DDL Statement (INSERT, UPDATE, or DELETE). The contents of the FlowFile must be a valid SQL statement. Attributes can be used as parameters so that the contents of the FlowFile can be parameterized SQL statements in order to avoid SQL injection attacks. -- *PutKafka*: Sends the contents of a FlowFile as a message to Apache Kafka, specifically for 0.8.x versions. The FlowFile can be sent as a single message or a delimiter, such as a - new-line can be specified, in order to send many messages for a single FlowFile. +- *PublishKafka*: Sends the contents of a FlowFile as a message to Apache Kafka. The FlowFile can be sent as a single message or a delimiter, such as a + new-line, can be specified in order to send many messages for a single FlowFile. - *PutMongo*: Sends the contents of a FlowFile to Mongo as an INSERT or an UPDATE. === Splitting and Aggregation @@ -433,18 +411,12 @@ categorizing them by their functions. - *SplitContent*: Splits a single FlowFile into potentially many FlowFiles, similarly to SegmentContent. However, with SplitContent, the splitting is not performed on arbitrary byte boundaries but rather a byte sequence is specified on which to split the content. +[[HTTP]] === HTTP -- *GetHTTP*: Downloads the contents of a remote HTTP- or HTTPS-based URL into NiFi. The Processor will remember the ETag and Last-Modified Date - in order to ensure that the data is not continually ingested. - *ListenHTTP*: Starts an HTTP (or HTTPS) Server and listens for incoming connections. For any incoming POST request, the contents of the request are written out as a FlowFile, and a 200 response is returned. -- *InvokeHTTP*: Performs an HTTP Request that is configured by the user. This Processor is much more versatile than the GetHTTP and PostHTTP - but requires a bit more configuration. This Processor cannot be used as a Source Processor and is required to have incoming FlowFiles in order - to be triggered to perform its task. -- *PostHTTP*: Performs an HTTP POST request, sending the contents of the FlowFile as the body of the message. This is often used in conjunction - with ListenHTTP in order to transfer data between two different instances of NiFi in cases where Site-to-Site cannot be used (for instance, - when the nodes cannot access each other directly and are able to communicate through an HTTP proxy). - *Note*: HTTP is available as a link:user-guide.html#site-to-site[Site-to-Site] transport protocol in addition to the existing RAW socket transport. It also supports HTTP Proxy. Using HTTP Site-to-Site is recommended since it's more scalable, and can provide bi-directional data transfer using input/output ports with better user authentication and authorization. +- *InvokeHTTP*: Can send a wide variety of HTTP Requests to a server, as configured by the user. A GET request can download data from an HTTP server. + A POST request can send the contents of a FlowFile in the body of the request to an HTTP server. - *HandleHttpRequest* / *HandleHttpResponse*: The HandleHttpRequest Processor is a Source Processor that starts an embedded HTTP(S) server similarly to ListenHTTP. However, it does not send a response to the client. Instead, the FlowFile is sent out with the body of the HTTP request as its contents and attributes for all of the typical Servlet parameters, headers, etc. as Attributes. The HandleHttpResponse then is able to @@ -522,8 +494,8 @@ While this may seem confusing at first, the section below on < Cluster), the active thread count is more comprehensive and includes these as well as any -other threads (reporting tasks, controller services, etc.) +It is important to note that active threads only captures threads by Processors that are on the graph. +When broken down by node in the cluster (Global Menu -> Cluster), the active thread count is more comprehensive and includes these plus any +other threads (Input and Output Ports, Funnels, Remote Process Groups, Reporting Tasks, etc.) === Component Statistics @@ -612,10 +584,10 @@ by the component. These statistics provide information about how much data has b window and allows us to see things like the number of FlowFiles that have been consumed by a Processor, as well as the number of FlowFiles that have been emitted by the Processor. -The connections between Processors also expose the number of items that are currently queued. +The connections between Processors also expose several statistics about items that pass through the connection. It may also be valuable to see historical values for these metrics and, if clustered, how the different nodes compare to one another. -In order to see this information, we can right-click on a component and choose the `Stats` menu item. This will show us a graph that spans +In order to see this information, we can right-click on a component and choose the `View Status History` menu item. This will show us a graph that spans the time since NiFi was started, or up to 24 hours, whichever is less. The amount of time that is shown here can be extended or reduced by changing the configuration in the properties file. @@ -656,9 +628,9 @@ choose which Attributes will be important to your specific dataflows and make th [[EventDetails]] === Event Details Once we have performed our search, our table will be populated only with the events that match the search criteria. From here, we -can choose the Info icon ( -image:iconDetails.png[Details Icon] -) on the left-hand side of the table to view the details of that event: +can click the kebab icon ( +image:iconKebab.png["Menu"] +) on the right-hand side of the table and choose to `View Details` of that event: image:event-details.png[Event Details] @@ -692,10 +664,10 @@ this iterative development of the flow until it is processing the data exactly a === Lineage Graph -In addition to viewing the details of a Provenance event, we can also view the lineage of the FlowFile involved by clicking on the Lineage Icon ( -image:iconLineage.png[Lineage] -) from the table view. - +In addition to viewing the details of a Provenance event, we can also view the lineage of the FlowFile involved. +Click the kebab icon ( +image:iconKebab.png["Menu"] +) on the right-hand side of the table and choose to `Show Lineage` of that event. This provides us with a graphical representation of exactly what happened to that piece of data as it traversed the system: image:lineage-graph-annotated.png[Lineage Graph] @@ -722,7 +694,7 @@ addition to this Getting Started Guide: lengthy discussions of all of the different components that comprise the application. This guide is written with the NiFi Operator as its audience. It provides information on each of the different components available in NiFi and explains how to use the different features provided by the application. -- link:administration-guide.html[Administration Guide] - A guide for setting up and administering Apache NiFi for production environments. +- link:administration-guide.html[Administrator's Guide] - A guide for setting up and administering Apache NiFi for production environments. This guide provides information about the different system-level settings, such as setting up clusters of NiFi and securing access to the web UI and data. - link:expression-language-guide.html[Expression Language Guide] - A far more exhaustive guide for understanding the Expression Language than @@ -734,12 +706,10 @@ addition to this Getting Started Guide: - link:https://cwiki.apache.org/confluence/display/NIFI/Contributor+Guide[Contributor's Guide^] - A guide for explaining how to contribute work back to the Apache NiFi community so that others can make use of it. -Several blog postings have also been added to the Apache NiFi blog site: -link:https://blogs.apache.org/nifi/[https://blogs.apache.org/nifi/^] - -In addition to the blog and guides provided here, you can browse the different -link:https://nifi.apache.org/mailing_lists.html[NiFi Mailing Lists^] or send an e-mail to one of the mailing lists at +In addition to the guides provided here, you can browse the different +link:https://nifi.apache.org/community/contact/[NiFi Mailing Lists^] or send an e-mail to one of the mailing lists at link:mailto:users@nifi.apache.org[users@nifi.apache.org] or link:mailto:dev@nifi.apache.org[dev@nifi.apache.org]. -Many of the members of the NiFi community are also available on Twitter and actively monitor for tweets that mention @apachenifi. +Many of the members of the NiFi community are available on link:https://apachenifi.slack.com[Apache NiFi on Slack^] +and also actively monitor X/Twitter for posts that mention @apachenifi. diff --git a/nifi-docs/src/main/asciidoc/images/iconInfo2.png b/nifi-docs/src/main/asciidoc/images/iconInfo2.png new file mode 100644 index 0000000000000000000000000000000000000000..7178642d65ccfc13864e034caaac537c0efb3c0b GIT binary patch literal 301 zcmeAS@N?(olHy`uVBq!ia0vp^f*{Pn1SGfcUswyI7>k44ofy`glX(f`u%tWsIx;Y9 z?C1WI$O`0h7I;J!GcfQS24TkI`72U@f`>d^978PplYjVIE$EsjEVM%xNb!1@J z*w6hZkrl}2EbxddW?