NIFI-4791 Made external links open in new windows and fixed some broken links

Signed-off-by: Pierre Villard <pierre.villard.fr@gmail.com>

This closes #2413.
This commit is contained in:
Andrew Lim 2018-01-18 16:36:54 -05:00 committed by Pierre Villard
parent 94d0974003
commit 59970344fe
7 changed files with 81 additions and 76 deletions

View File

@ -17,6 +17,7 @@
= NiFi System Administrator's Guide = NiFi System Administrator's Guide
Apache NiFi Team <dev@nifi.apache.org> Apache NiFi Team <dev@nifi.apache.org>
:homepage: http://nifi.apache.org :homepage: http://nifi.apache.org
:linkattrs:
== System Requirements == System Requirements
Apache NiFi can run on something as simple as a laptop, but it can also be clustered across many enterprise-class servers. Therefore, the amount of hardware and memory needed will depend on the size and nature of the dataflow involved. The data is stored on disk while NiFi is processing it. So NiFi needs to have sufficient disk space allocated for its various repositories, particularly the content repository, flowfile repository, and provenance repository (see the <<system_properties>> section for more information about these repositories). NiFi has the following minimum system requirements: Apache NiFi can run on something as simple as a laptop, but it can also be clustered across many enterprise-class servers. Therefore, the amount of hardware and memory needed will depend on the size and nature of the dataflow involved. The data is stored on disk while NiFi is processing it. So NiFi needs to have sufficient disk space allocated for its various repositories, particularly the content repository, flowfile repository, and provenance repository (see the <<system_properties>> section for more information about these repositories). NiFi has the following minimum system requirements:
@ -273,7 +274,7 @@ For a client certificate that can be easily imported into the browser, specify:
[[user_authentication]] [[user_authentication]]
== User Authentication == User Authentication
NiFi supports user authentication via client certificates, via username/password, via Apache Knox, or via OpenId Connect (http://openid.net/connect). NiFi supports user authentication via client certificates, via username/password, via Apache Knox, or via link:http://openid.net/connect[OpenId Connect^].
Username/password authentication is performed by a 'Login Identity Provider'. The Login Identity Provider is a pluggable mechanism for Username/password authentication is performed by a 'Login Identity Provider'. The Login Identity Provider is a pluggable mechanism for
authenticating users via their username/password. Which Login Identity Provider to use is configured in the _nifi.properties_ file. authenticating users via their username/password. Which Login Identity Provider to use is configured in the _nifi.properties_ file.
@ -292,7 +293,7 @@ token during authentication.
NOTE: NiFi can only be configured for username/password, OpenId Connect, or Apache Knox at a given time. It does not support running each of NOTE: NiFi can only be configured for username/password, OpenId Connect, or Apache Knox at a given time. It does not support running each of
these concurrently. NiFi will require client certificates for authenticating users over HTTPS if none of these are configured. these concurrently. NiFi will require client certificates for authenticating users over HTTPS if none of these are configured.
A secured instance of NiFi cannot be accessed anonymously unless configured to use an LDAP or Kerberos Login Identity Provider, which in turn must be configured to explicitly allow anonymous access. Anonymous access is not currently possible by the default FileAuthorizer (see <<authorizer-configuration>>), but is a future effort (https://issues.apache.org/jira/browse/NIFI-2730[NIFI-2730]). A secured instance of NiFi cannot be accessed anonymously unless configured to use an LDAP or Kerberos Login Identity Provider, which in turn must be configured to explicitly allow anonymous access. Anonymous access is not currently possible by the default FileAuthorizer (see <<authorizer-configuration>>), but is a future effort (link:https://issues.apache.org/jira/browse/NIFI-2730[NIFI-2730^]).
NOTE: NiFi does not perform user authentication over HTTP. Using HTTP, all users will be granted all roles. NOTE: NiFi does not perform user authentication over HTTP. Using HTTP, all users will be granted all roles.
@ -404,7 +405,7 @@ To enable authentication via OpenId Connect the following properties must be con
[options="header,footer"] [options="header,footer"]
|================================================================================================================================================== |==================================================================================================================================================
| Property Name | Description | Property Name | Description
|`nifi.security.user.oidc.discovery.url` | The discovery URL for the desired OpenId Connect Provider (http://openid.net/specs/openid-connect-discovery-1_0.html). |`nifi.security.user.oidc.discovery.url` | The discovery URL for the desired OpenId Connect Provider (link:http://openid.net/specs/openid-connect-discovery-1_0.html[http://openid.net/specs/openid-connect-discovery-1_0.html^]).
|`nifi.security.user.oidc.connect.timeout` | Connect timeout when communicating with the OpenId Connect Provider. |`nifi.security.user.oidc.connect.timeout` | Connect timeout when communicating with the OpenId Connect Provider.
|`nifi.security.user.oidc.read.timeout` | Read timeout when communicating with the OpenId Connect Provider. |`nifi.security.user.oidc.read.timeout` | Read timeout when communicating with the OpenId Connect Provider.
|`nifi.security.user.oidc.client.id` | The client id for NiFi after registration with the OpenId Connect Provider. |`nifi.security.user.oidc.client.id` | The client id for NiFi after registration with the OpenId Connect Provider.
@ -1264,7 +1265,7 @@ The `EncryptContent` processor allows for the encryption and decryption of data,
[[key-derivation-functions]] [[key-derivation-functions]]
=== Key Derivation Functions === Key Derivation Functions
Key Derivation Functions (KDF) are mechanisms by which human-readable information, usually a password or other secret information, is translated into a cryptographic key suitable for data protection. For further information, read https://en.wikipedia.org/wiki/Key_derivation_function[the Wikipedia entry on Key Derivation Functions]. Key Derivation Functions (KDF) are mechanisms by which human-readable information, usually a password or other secret information, is translated into a cryptographic key suitable for data protection. For further information, read the link:https://en.wikipedia.org/wiki/Key_derivation_function[Wikipedia entry on Key Derivation Functions^].
Currently, KDFs are ingested by `CipherProvider` implementations and return a fully-initialized `Cipher` object to be used for encryption or decryption. Due to the use of a `CipherProviderFactory`, the KDFs are not customizable at this time. Future enhancements will include the ability to provide custom cost parameters to the KDF at initialization time. As a work-around, `CipherProvider` instances can be initialized with custom cost parameters in the constructor but this is not currently supported by the `CipherProviderFactory`. Currently, KDFs are ingested by `CipherProvider` implementations and return a fully-initialized `Cipher` object to be used for encryption or decryption. Due to the use of a `CipherProviderFactory`, the KDFs are not customizable at this time. Future enhancements will include the ability to provide custom cost parameters to the KDF at initialization time. As a work-around, `CipherProvider` instances can be initialized with custom cost parameters in the constructor but this is not currently supported by the `CipherProviderFactory`.
Here are the KDFs currently supported by NiFi (primarily in the `EncryptContent` processor for password-based encryption (PBE)) and relevant notes: Here are the KDFs currently supported by NiFi (primarily in the `EncryptContent` processor for password-based encryption (PBE)) and relevant notes:
@ -1276,16 +1277,16 @@ Here are the KDFs currently supported by NiFi (primarily in the `EncryptContent`
** This KDF is provided for compatibility with data encrypted using OpenSSL's default PBE, known as `EVP_BytesToKey`. This is a single iteration of MD5 over the concatenation of the password and 8 bytes of random ASCII salt. OpenSSL recommends using `PBKDF2` for key derivation but does not expose the library method necessary to the command-line tool, so this KDF is still the de facto default for command-line encryption. ** This KDF is provided for compatibility with data encrypted using OpenSSL's default PBE, known as `EVP_BytesToKey`. This is a single iteration of MD5 over the concatenation of the password and 8 bytes of random ASCII salt. OpenSSL recommends using `PBKDF2` for key derivation but does not expose the library method necessary to the command-line tool, so this KDF is still the de facto default for command-line encryption.
* Bcrypt * Bcrypt
** This KDF was added in v0.5.0. ** This KDF was added in v0.5.0.
** https://en.wikipedia.org/wiki/Bcrypt[Bcrypt] is an adaptive function based on the https://en.wikipedia.org/wiki/Blowfish_(cipher)[Blowfish] cipher. This KDF is strongly recommended as it automatically incorporates a random 16 byte salt, configurable cost parameter (or "work factor"), and is hardened against brute-force attacks using https://en.wikipedia.org/wiki/General-purpose_computing_on_graphics_processing_units[GPGPU] (which share memory between cores) by requiring access to "large" blocks of memory during the key derivation. It is less resistant to https://en.wikipedia.org/wiki/Field-programmable_gate_array[FPGA] brute-force attacks where the gate arrays have access to individual embedded RAM blocks. ** link:https://en.wikipedia.org/wiki/Bcrypt[Bcrypt^] is an adaptive function based on the link:https://en.wikipedia.org/wiki/Blowfish_(cipher)[Blowfish^] cipher. This KDF is strongly recommended as it automatically incorporates a random 16 byte salt, configurable cost parameter (or "work factor"), and is hardened against brute-force attacks using link:https://en.wikipedia.org/wiki/General-purpose_computing_on_graphics_processing_units[GPGPU^] (which share memory between cores) by requiring access to "large" blocks of memory during the key derivation. It is less resistant to link:https://en.wikipedia.org/wiki/Field-programmable_gate_array[FPGA^] brute-force attacks where the gate arrays have access to individual embedded RAM blocks.
** Because the length of a Bcrypt-derived key is always 184 bits, the complete output is then fed to a `SHA-512` digest and truncated to the desired key length. This provides the benefit of the avalanche effect on the formatted input. ** Because the length of a Bcrypt-derived key is always 184 bits, the complete output is then fed to a `SHA-512` digest and truncated to the desired key length. This provides the benefit of the avalanche effect on the formatted input.
** The recommended minimum work factor is 12 (2^12^ key derivation rounds) (as of 2/1/2016 on commodity hardware) and should be increased to the threshold at which legitimate systems will encounter detrimental delays (see schedule below or use `BcryptCipherProviderGroovyTest#testDefaultConstructorShouldProvideStrongWorkFactor()` to calculate safe minimums). ** The recommended minimum work factor is 12 (2^12^ key derivation rounds) (as of 2/1/2016 on commodity hardware) and should be increased to the threshold at which legitimate systems will encounter detrimental delays (see schedule below or use `BcryptCipherProviderGroovyTest#testDefaultConstructorShouldProvideStrongWorkFactor()` to calculate safe minimums).
** The salt format is `$2a$10$ABCDEFGHIJKLMNOPQRSTUV`. The salt is delimited by `$` and the three sections are as follows: ** The salt format is `$2a$10$ABCDEFGHIJKLMNOPQRSTUV`. The salt is delimited by `$` and the three sections are as follows:
*** `2a` - the version of the format. An extensive explanation can be found http://blog.ircmaxell.com/2012/12/seven-ways-to-screw-up-bcrypt.html[here]. NiFi currently uses `2a` for all salts generated internally. *** `2a` - the version of the format. An extensive explanation can be found link:http://blog.ircmaxell.com/2012/12/seven-ways-to-screw-up-bcrypt.html[here^]. NiFi currently uses `2a` for all salts generated internally.
*** `10` - the work factor. This is actually the log~2~ value, so the total iteration count would be 2^10^ in this case. *** `10` - the work factor. This is actually the log~2~ value, so the total iteration count would be 2^10^ in this case.
*** `ABCDEFGHIJKLMNOPQRSTUV` - the 22 character, Base64-encoded, unpadded, raw salt value. This decodes to a 16 byte salt used in the key derivation. *** `ABCDEFGHIJKLMNOPQRSTUV` - the 22 character, Base64-encoded, unpadded, raw salt value. This decodes to a 16 byte salt used in the key derivation.
* Scrypt * Scrypt
** This KDF was added in v0.5.0. ** This KDF was added in v0.5.0.
** https://en.wikipedia.org/wiki/Scrypt[Scrypt] is an adaptive function designed in response to `bcrypt`. This KDF is recommended as it requires relatively large amounts of memory for each derivation, making it resistant to hardware brute-force attacks. ** link:https://en.wikipedia.org/wiki/Scrypt[Scrypt^] is an adaptive function designed in response to `bcrypt`. This KDF is recommended as it requires relatively large amounts of memory for each derivation, making it resistant to hardware brute-force attacks.
** The recommended minimum cost is `N`=2^14^, `r`=8, `p`=1 (as of 2/1/2016 on commodity hardware) and should be increased to the threshold at which legitimate systems will encounter detrimental delays (see schedule below or use `ScryptCipherProviderGroovyTest#testDefaultConstructorShouldProvideStrongParameters()` to calculate safe minimums). ** The recommended minimum cost is `N`=2^14^, `r`=8, `p`=1 (as of 2/1/2016 on commodity hardware) and should be increased to the threshold at which legitimate systems will encounter detrimental delays (see schedule below or use `ScryptCipherProviderGroovyTest#testDefaultConstructorShouldProvideStrongParameters()` to calculate safe minimums).
** The salt format is `$s0$e0101$ABCDEFGHIJKLMNOPQRSTUV`. The salt is delimited by `$` and the three sections are as follows: ** The salt format is `$s0$e0101$ABCDEFGHIJKLMNOPQRSTUV`. The salt is delimited by `$` and the three sections are as follows:
*** `s0` - the version of the format. NiFi currently uses `s0` for all salts generated internally. *** `s0` - the version of the format. NiFi currently uses `s0` for all salts generated internally.
@ -1294,7 +1295,7 @@ Here are the KDFs currently supported by NiFi (primarily in the `EncryptContent`
*** `ABCDEFGHIJKLMNOPQRSTUV` - the 12-44 character, Base64-encoded, unpadded, raw salt value. This decodes to a 8-32 byte salt used in the key derivation. *** `ABCDEFGHIJKLMNOPQRSTUV` - the 12-44 character, Base64-encoded, unpadded, raw salt value. This decodes to a 8-32 byte salt used in the key derivation.
* PBKDF2 * PBKDF2
** This KDF was added in v0.5.0. ** This KDF was added in v0.5.0.
** https://en.wikipedia.org/wiki/PBKDF2[Password-Based Key Derivation Function 2] is an adaptive derivation function which uses an internal pseudorandom function (PRF) and iterates it many times over a password and salt (at least 16 bytes). ** link:https://en.wikipedia.org/wiki/PBKDF2[Password-Based Key Derivation Function 2^] is an adaptive derivation function which uses an internal pseudorandom function (PRF) and iterates it many times over a password and salt (at least 16 bytes).
** The PRF is recommended to be `HMAC/SHA-256` or `HMAC/SHA-512`. The use of an HMAC cryptographic hash function mitigates a length extension attack. ** The PRF is recommended to be `HMAC/SHA-256` or `HMAC/SHA-512`. The use of an HMAC cryptographic hash function mitigates a length extension attack.
** The recommended minimum number of iterations is 160,000 (as of 2/1/2016 on commodity hardware). This number should be doubled every two years (see schedule below or use `PBKDF2CipherProviderGroovyTest#testDefaultConstructorShouldProvideStrongIterationCount()` to calculate safe minimums). ** The recommended minimum number of iterations is 160,000 (as of 2/1/2016 on commodity hardware). This number should be doubled every two years (see schedule below or use `PBKDF2CipherProviderGroovyTest#testDefaultConstructorShouldProvideStrongIterationCount()` to calculate safe minimums).
** This KDF is not memory-hard (can be parallelized massively with commodity hardware) but is still recommended as sufficient by http://csrc.nist.gov/publications/nistpubs/800-132/nist-sp800-132.pdf[NIST SP 800-132 (PDF)] and many cryptographers (when used with a proper iteration count and HMAC cryptographic hash function). ** This KDF is not memory-hard (can be parallelized massively with commodity hardware) but is still recommended as sufficient by http://csrc.nist.gov/publications/nistpubs/800-132/nist-sp800-132.pdf[NIST SP 800-132 (PDF)] and many cryptographers (when used with a proper iteration count and HMAC cryptographic hash function).
@ -1304,18 +1305,18 @@ Here are the KDFs currently supported by NiFi (primarily in the `EncryptContent`
==== Additional Resources ==== Additional Resources
* http://stackoverflow.com/a/30308723/70465[Explanation of optimal scrypt cost parameters and relationships] * link:http://stackoverflow.com/a/30308723/70465[Explanation of optimal scrypt cost parameters and relationships^]
* http://csrc.nist.gov/publications/nistpubs/800-132/nist-sp800-132.pdf[NIST Special Publication 800-132] * link:http://csrc.nist.gov/publications/nistpubs/800-132/nist-sp800-132.pdf[NIST Special Publication 800-132^]
* https://www.owasp.org/index.php/Password_Storage_Cheat_Sheet#Work_Factor[OWASP Password Storage Work Factor Calculations] * link:https://www.owasp.org/index.php/Password_Storage_Cheat_Sheet#Work_Factor[OWASP Password Storage Work Factor Calculations^]
* http://security.stackexchange.com/a/3993/16485[PBKDF2 rounds calculations] * link:http://security.stackexchange.com/a/3993/16485[PBKDF2 rounds calculations^]
* http://blog.ircmaxell.com/2014/03/why-i-dont-recommend-scrypt.html[Scrypt as KDF vs password storage vulnerabilities] * link:http://blog.ircmaxell.com/2014/03/why-i-dont-recommend-scrypt.html[Scrypt as KDF vs password storage vulnerabilities^]
* http://security.stackexchange.com/a/26253/16485[Scrypt vs. Bcrypt (as of 2010)] * link:http://security.stackexchange.com/a/26253/16485[Scrypt vs. Bcrypt (as of 2010)^]
* http://security.stackexchange.com/a/6415/16485[Bcrypt vs PBKDF2] * link:http://security.stackexchange.com/a/6415/16485[Bcrypt vs PBKDF2^]
* http://wildlyinaccurate.com/bcrypt-choosing-a-work-factor/[Choosing a work factor for Bcrypt] * link:http://wildlyinaccurate.com/bcrypt-choosing-a-work-factor/[Choosing a work factor for Bcrypt^]
* https://docs.spring.io/spring-security/site/docs/current/apidocs/org/springframework/security/crypto/bcrypt/BCrypt.html[Spring Security Bcrypt] * link:https://docs.spring.io/spring-security/site/docs/current/apidocs/org/springframework/security/crypto/bcrypt/BCrypt.html[Spring Security Bcrypt^]
* https://www.openssl.org/docs/man1.1.0/crypto/EVP_BytesToKey.html[OpenSSL EVP BytesToKey PKCS#1v1.5] * link:https://www.openssl.org/docs/man1.1.0/crypto/EVP_BytesToKey.html[OpenSSL EVP BytesToKey PKCS#1v1.5^]
* https://wiki.openssl.org/index.php/Manual:PKCS5_PBKDF2_HMAC(3)[OpenSSL PBKDF2 KDF] * link:https://wiki.openssl.org/index.php/Manual:PKCS5_PBKDF2_HMAC(3)[OpenSSL PBKDF2 KDF^]
* http://security.stackexchange.com/a/29139/16485[OpenSSL KDF flaws description] * link:http://security.stackexchange.com/a/29139/16485[OpenSSL KDF flaws description^]
=== Salt and IV Encoding === Salt and IV Encoding
@ -1347,7 +1348,7 @@ image:pbkdf2-salt.png["PBKDF2 Salt & IV Encoding"]
=== Java Cryptography Extension (JCE) Limited Strength Jurisdiction Policies === Java Cryptography Extension (JCE) Limited Strength Jurisdiction Policies
Because of US export regulations, default JVMs have http://docs.oracle.com/javase/7/docs/technotes/guides/security/SunProviders.html#importlimits[limits imposed on the strength of cryptographic operations] available to them. For example, AES operations are limited to `128 bit keys` by default. While `AES-128` is cryptographically safe, this can have unintended consequences, specifically on Password-based Encryption (PBE). Because of US export regulations, default JVMs have link:http://docs.oracle.com/javase/7/docs/technotes/guides/security/SunProviders.html#importlimits[limits imposed on the strength of cryptographic operations^] available to them. For example, AES operations are limited to `128 bit keys` by default. While `AES-128` is cryptographically safe, this can have unintended consequences, specifically on Password-based Encryption (PBE).
PBE is the process of deriving a cryptographic key for encryption or decryption from _user-provided secret material_, usually a password. Rather than a human remembering a (random-appearing) 32 or 64 character hexadecimal string, a password or passphrase is used. PBE is the process of deriving a cryptographic key for encryption or decryption from _user-provided secret material_, usually a password. Rather than a human remembering a (random-appearing) 32 or 64 character hexadecimal string, a password or passphrase is used.
@ -1426,7 +1427,7 @@ image:allow-weak-crypto.png["Allow Insecure Cryptographic Modes", width=940]
On a JVM with limited strength cryptography, some PBE algorithms limit the maximum password length to 7, and in this case it will not be possible to provide a "safe" password. It is recommended to install the JCE Unlimited Strength Jurisdiction Policy files for the JVM to mitigate this issue. On a JVM with limited strength cryptography, some PBE algorithms limit the maximum password length to 7, and in this case it will not be possible to provide a "safe" password. It is recommended to install the JCE Unlimited Strength Jurisdiction Policy files for the JVM to mitigate this issue.
* http://www.oracle.com/technetwork/java/javase/downloads/jce8-download-2133166.html[JCE Unlimited Strength Jurisdiction Policy files for Java 8] * link:http://www.oracle.com/technetwork/java/javase/downloads/jce8-download-2133166.html[JCE Unlimited Strength Jurisdiction Policy files for Java 8^]
If on a system where the unlimited strength policies cannot be installed, it is recommended to switch to an algorithm that supports longer passwords (see table above). If on a system where the unlimited strength policies cannot be installed, it is recommended to switch to an algorithm that supports longer passwords (see table above).
@ -1436,7 +1437,7 @@ If on a system where the unlimited strength policies cannot be installed, it is
If it is not possible to install the unlimited strength jurisdiction policies, the `Allow Weak Crypto` setting can be changed to `allowed`, but *this is _not_ recommended*. Changing this setting explicitly acknowledges the inherent risk in using weak cryptographic configurations. If it is not possible to install the unlimited strength jurisdiction policies, the `Allow Weak Crypto` setting can be changed to `allowed`, but *this is _not_ recommended*. Changing this setting explicitly acknowledges the inherent risk in using weak cryptographic configurations.
===================== =====================
It is preferable to request upstream/downstream systems to switch to https://cwiki.apache.org/confluence/display/NIFI/Encryption+Information[keyed encryption] or use a "strong" https://cwiki.apache.org/confluence/display/NIFI/Key+Derivation+Function+Explanations[Key Derivation Function (KDF) supported by NiFi]. It is preferable to request upstream/downstream systems to switch to link:https://cwiki.apache.org/confluence/display/NIFI/Encryption+Information[keyed encryption^] or use a "strong" link:https://cwiki.apache.org/confluence/display/NIFI/Key+Derivation+Function+Explanations[Key Derivation Function (KDF) supported by NiFi^].
== Encrypted Passwords in Configuration Files == Encrypted Passwords in Configuration Files
@ -2126,7 +2127,7 @@ echo 2 > state/zookeeper/myid
And so on. And so on.
For more information on the properties used to administer ZooKeeper, see the For more information on the properties used to administer ZooKeeper, see the
link:https://zookeeper.apache.org/doc/current/zookeeperAdmin.html[ZooKeeper Admin Guide]. link:https://zookeeper.apache.org/doc/current/zookeeperAdmin.html[ZooKeeper Admin Guide^].
For information on securing the embedded ZooKeeper Server, see the <<securing_zookeeper>> section below. For information on securing the embedded ZooKeeper Server, see the <<securing_zookeeper>> section below.
@ -2160,8 +2161,7 @@ In order to secure the communications, we need to ensure that both the client an
NiFi ZooKeeper client and embedded ZooKeeper server to use Kerberos are provided below. NiFi ZooKeeper client and embedded ZooKeeper server to use Kerberos are provided below.
If Kerberos is not already setup in your environment, you can find information on installing and setting up a Kerberos Server at If Kerberos is not already setup in your environment, you can find information on installing and setting up a Kerberos Server at
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Managing_Smart_Cards/Configuring_a_Kerberos_5_Server.html[_https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Managing_Smart_Cards/Configuring_a_Kerberos_5_Server.html_] link:https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Managing_Smart_Cards/Configuring_a_Kerberos_5_Server.html[https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Managing_Smart_Cards/Configuring_a_Kerberos_5_Server.html^]. This guide assumes that Kerberos already has been installed in the environment in which NiFi is running.
. This guide assumes that Kerberos already has been installed in the environment in which NiFi is running.
Note, the following procedures for kerberizing an Embedded ZooKeeper server in your NiFi Node and kerberizing a ZooKeeper NiFi client will require that Note, the following procedures for kerberizing an Embedded ZooKeeper server in your NiFi Node and kerberizing a ZooKeeper NiFi client will require that
Kerberos client libraries be installed. This is accomplished in Fedora-based Linux distributions via: Kerberos client libraries be installed. This is accomplished in Fedora-based Linux distributions via:
@ -2335,8 +2335,8 @@ Failure to do so, may result in errors similar to the following:
[source] [source]
2016-01-08 16:08:57,888 ERROR [pool-26-thread-1-SendThread(localhost:2181)] o.a.zookeeper.client.ZooKeeperSaslClient An error: (java.security.PrivilegedActionException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Server not found in Kerberos database (7) - LOOKING_UP_SERVER)]) occurred when evaluating Zookeeper Quorum Member's received SASL token. Zookeeper Client will go to AUTH_FAILED state. 2016-01-08 16:08:57,888 ERROR [pool-26-thread-1-SendThread(localhost:2181)] o.a.zookeeper.client.ZooKeeperSaslClient An error: (java.security.PrivilegedActionException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Server not found in Kerberos database (7) - LOOKING_UP_SERVER)]) occurred when evaluating Zookeeper Quorum Member's received SASL token. Zookeeper Client will go to AUTH_FAILED state.
If there are problems communicating or authenticating with Kerberos, If there are problems communicating or authenticating with Kerberos, this
link:http://docs.oracle.com/javase/7/docs/technotes/guides/security/jgss/tutorials/Troubleshooting.html[this Troubleshooting Guide] may be of value. link:http://docs.oracle.com/javase/7/docs/technotes/guides/security/jgss/tutorials/Troubleshooting.html[Troubleshooting Guide^] may be of value.
One of the most important notes in the above Troubleshooting guide is the mechanism for turning on Debug output for Kerberos. One of the most important notes in the above Troubleshooting guide is the mechanism for turning on Debug output for Kerberos.
This is done by setting the `sun.security.krb5.debug` environment variable. This is done by setting the `sun.security.krb5.debug` environment variable.
@ -2363,7 +2363,7 @@ For example, you may want to use the ZooKeeper Migrator when you are:
* Migrating from an external ZooKeeper to an embedded ZooKeeper in NiFi 1.x * Migrating from an external ZooKeeper to an embedded ZooKeeper in NiFi 1.x
The NiFi ZooKeeper Migrator is part of the NiFi Toolkit and is downloaded separately from the The NiFi ZooKeeper Migrator is part of the NiFi Toolkit and is downloaded separately from the
link:https://nifi.apache.org/download.html[Apache NiFi download page]. link:https://nifi.apache.org/download.html[Apache NiFi download page^].
[[zk_migrator_command_line_parameters]] [[zk_migrator_command_line_parameters]]
==== zk-migrator.sh Command Line Parameters ==== zk-migrator.sh Command Line Parameters
@ -2676,9 +2676,9 @@ See <<kerberos_properties>> for complete documentation.
=== Notes === Notes
* Kerberos is case-sensitive in many places and the error messages (or lack thereof) may not be sufficiently explanatory. Check the case sensitivity of the service principal in your configuration files. Convention is `HTTP/fully.qualified.domain@REALM`. * Kerberos is case-sensitive in many places and the error messages (or lack thereof) may not be sufficiently explanatory. Check the case sensitivity of the service principal in your configuration files. Convention is `HTTP/fully.qualified.domain@REALM`.
* Browsers have varying levels of restriction when dealing with SPNEGO negotiations. Some will provide the local Kerberos ticket to any domain that requests it, while others whitelist the trusted domains. See link:http://docs.spring.io/autorepo/docs/spring-security-kerberos/1.0.2.BUILD-SNAPSHOT/reference/htmlsingle/#browserspnegoconfig[Spring Security Kerberos - Reference Documentation: Appendix E. Configure browsers for SPNEGO Negotiation] for common browsers. * Browsers have varying levels of restriction when dealing with SPNEGO negotiations. Some will provide the local Kerberos ticket to any domain that requests it, while others whitelist the trusted domains. See link:http://docs.spring.io/autorepo/docs/spring-security-kerberos/1.0.2.BUILD-SNAPSHOT/reference/htmlsingle/#browserspnegoconfig[Spring Security Kerberos - Reference Documentation: Appendix E. Configure browsers for SPNEGO Negotiation^] for common browsers.
* Some browsers (legacy IE) do not support recent encryption algorithms such as AES, and are restricted to legacy algorithms (DES). This should be noted when generating keytabs. * Some browsers (legacy IE) do not support recent encryption algorithms such as AES, and are restricted to legacy algorithms (DES). This should be noted when generating keytabs.
* The KDC must be configured and a service principal defined for NiFi and a keytab exported. Comprehensive instructions for Kerberos server configuration and administration are beyond the scope of this document (see link:http://web.mit.edu/kerberos/krb5-current/doc/admin/index.html[MIT Kerberos Admin Guide]), but an example is below: * The KDC must be configured and a service principal defined for NiFi and a keytab exported. Comprehensive instructions for Kerberos server configuration and administration are beyond the scope of this document (see link:http://web.mit.edu/kerberos/krb5-current/doc/admin/index.html[MIT Kerberos Admin Guide^]), but an example is below:
Adding a service principal for a server at `nifi.nifi.apache.org` and exporting the keytab from the KDC: Adding a service principal for a server at `nifi.nifi.apache.org` and exporting the keytab from the KDC:

View File

@ -17,6 +17,7 @@
= NiFi Developer's Guide = NiFi Developer's Guide
Apache NiFi Team <dev@nifi.apache.org> Apache NiFi Team <dev@nifi.apache.org>
:homepage: http://nifi.apache.org :homepage: http://nifi.apache.org
:linkattrs:
== Introduction == Introduction
@ -569,21 +570,21 @@ The NiFi API provides notification support through use of Java
Annotations. The `org.apache.nifi.annotations.notification` package Annotations. The `org.apache.nifi.annotations.notification` package
contains several annotations for notification management. The following contains several annotations for notification management. The following
annotations may be applied to Java methods in a NiFi component to annotations may be applied to Java methods in a NiFi component to
indicate to the framework when the methods should be called. For the indicate to the framework when the methods should be called. For the
discussion of Component Notification, we will define a NiFi component discussion of Component Notification, we will define a NiFi component
as a *Processor*, *Controller Service*, or *Reporting Task*. as a *Processor*, *Controller Service*, or *Reporting Task*.
==== @OnPrimaryNodeStateChange ==== @OnPrimaryNodeStateChange
The `@OnPrimaryNodeStateChange` annotation causes a method to be invoked The `@OnPrimaryNodeStateChange` annotation causes a method to be invoked
as soon as the state of the Primary Node in a cluster has changed. as soon as the state of the Primary Node in a cluster has changed.
Methods with this annotation should take either no arguments or one Methods with this annotation should take either no arguments or one
argument of type `PrimaryNodeState`. The `PrimaryNodeState` provides argument of type `PrimaryNodeState`. The `PrimaryNodeState` provides
context about what changed so that the component can take appropriate context about what changed so that the component can take appropriate
action. The `PrimaryNodeState` enumerator has two possible values: action. The `PrimaryNodeState` enumerator has two possible values:
`ELECTED_PRIMARY_NODE` (the node receiving this `ELECTED_PRIMARY_NODE` (the node receiving this
state has been elected the Primary Node of the NiFi cluster), or state has been elected the Primary Node of the NiFi cluster), or
`PRIMARY_NODE_REVOKED` (the node receiving this state was the Primary `PRIMARY_NODE_REVOKED` (the node receiving this state was the Primary
Node but has now had its Primary Node role revoked). Node but has now had its Primary Node role revoked).
[[restricted]] [[restricted]]
@ -1977,7 +1978,7 @@ There are two UI extension points that are available in NiFi:
- Custom Processor UIs - Custom Processor UIs
- Content Viewers - Content Viewers
Custom UIs can be created to provide configuration options beyond the standard property/value tables available in most processor settings. Examples of processors with Custom UIs are link:https://github.com/apache/nifi/tree/master/nifi-nar-bundles/nifi-update-attribute-bundle[UpdateAttribute] and link:https://github.com/apache/nifi/tree/master/nifi-nar-bundles/nifi-standard-bundle[JoltTransformJSON]. Custom UIs can be created to provide configuration options beyond the standard property/value tables available in most processor settings. Examples of processors with Custom UIs are link:https://github.com/apache/nifi/tree/master/nifi-nar-bundles/nifi-update-attribute-bundle[UpdateAttribute^] and link:https://github.com/apache/nifi/tree/master/nifi-nar-bundles/nifi-standard-bundle[JoltTransformJSON^].
Content Viewers can be created to extend the types of data that can be viewed within NiFi. NiFi comes with NARs in the lib directory which contain content viewers for data types such as csv, xml, avro, json (standard-nar) and image types such as png, jpeg and gif (media-nar). Content Viewers can be created to extend the types of data that can be viewed within NiFi. NiFi comes with NARs in the lib directory which contain content viewers for data types such as csv, xml, avro, json (standard-nar) and image types such as png, jpeg and gif (media-nar).
@ -2539,18 +2540,18 @@ The back end of Apache NiFi is written in Java. The web tier makes use of JAX-RS
used to provide a user interface. We depend on several third-party JavaScript libraries, including D3 and JQuery, used to provide a user interface. We depend on several third-party JavaScript libraries, including D3 and JQuery,
among others. We make use of Apache Maven for our builds and Git for our version control system. among others. We make use of Apache Maven for our builds and Git for our version control system.
Documentation is created in link:http://asciidoctor.org[AsciiDoc]. Documentation is created in link:http://asciidoctor.org[AsciiDoc^].
=== Where to Start? === Where to Start?
link:http://issues.apache.org/jira/browse/NIFI[NiFi's JIRA page] can be used to find tickets that are tagged as "beginner", link:http://issues.apache.org/jira/browse/NIFI[NiFi's JIRA page^] can be used to find tickets that are tagged as "beginner",
or you can dig into any of the tickets for creating Processors. Processors should be self-contained and not rely on other or you can dig into any of the tickets for creating Processors. Processors should be self-contained and not rely on other
outside components (except for Controller Services), so they make for excellent starting points for new NiFi developers to outside components (except for Controller Services), so they make for excellent starting points for new NiFi developers to
get started. This exposes the developer to the NiFi API and is the most extensible part of the dataflow system. get started. This exposes the developer to the NiFi API and is the most extensible part of the dataflow system.
System-level and overview documentation is located in '<code checkout location>/nifi/nifi-docs/src/main/asciidoc'. System-level and overview documentation is located in '<code checkout location>/nifi/nifi-docs/src/main/asciidoc'.
Tools available to facilitate documentation generation are available at link:http://asciidoctor.org/docs/editing-asciidoc-with-live-preview/[Editing AsciiDoc with Live Preview]. Tools available to facilitate documentation generation are available at link:http://asciidoctor.org/docs/editing-asciidoc-with-live-preview/[Editing AsciiDoc with Live Preview^].
=== Supplying a contribution === Supplying a contribution

View File

@ -17,6 +17,7 @@
= Apache NiFi Expression Language Guide = Apache NiFi Expression Language Guide
Apache NiFi Team <dev@nifi.apache.org> Apache NiFi Team <dev@nifi.apache.org>
:homepage: http://nifi.apache.org :homepage: http://nifi.apache.org
:linkattrs:
[[overview]] [[overview]]
== Overview == Overview
@ -191,7 +192,7 @@ Language supports four different data types:
- *Decimal*: A Decimal is a numeric value that can support decimals and larger values with minimal loss of precision. More precisely it - *Decimal*: A Decimal is a numeric value that can support decimals and larger values with minimal loss of precision. More precisely it
is a double-precision 64-bit IEEE 754 floating point. Due to this minimal loss of precision this data type should not be used for is a double-precision 64-bit IEEE 754 floating point. Due to this minimal loss of precision this data type should not be used for
very precise values, such as currency. For more documentation on the range of values stored in this data type very precise values, such as currency. For more documentation on the range of values stored in this data type
refer to this https://docs.oracle.com/javase/specs/jls/se7/html/jls-4.html#jls-4.2.3[link]. The following are some examples of the forms of refer to this link:https://docs.oracle.com/javase/specs/jls/se7/html/jls-4.html#jls-4.2.3[link^]. The following are some examples of the forms of
literal decimals that are supported in expression language (the "E" can also be lower-case): literal decimals that are supported in expression language (the "E" can also be lower-case):
* 1.1 * 1.1
@ -262,7 +263,7 @@ subject exists and `false` otherwise.#
[.function] [.function]
=== isEmpty === isEmpty
*Description*: [.description]#The `isEmpty` function returns `true` if the Subject is null, does not contain any characters *Description*: [.description]#The `isEmpty` function returns `true` if the Subject is null, does not contain any characters
or contains only white-space (new line, carriage return, space, tab), `false` otherwise.# or contains only white-space (new line, carriage return, space, tab), `false` otherwise.#
*Subject Type*: [.subject]#String# *Subject Type*: [.subject]#String#
@ -1716,7 +1717,7 @@ must be valid for the base being converted from. For example converting "0xFF" f
[.function] [.function]
=== math === math
*Description*: [.description]#ADVANCED FEATURE. This expression is designed to be used by advanced users only. It utilizes Java Reflection to run arbitrary java.lang.Math static methods. The exact API will depend on the version of Java you are running. The Java 8 API can be found here: https://docs.oracle.com/javase/8/docs/api/java/lang/Math.html *Description*: [.description]#ADVANCED FEATURE. This expression is designed to be used by advanced users only. It utilizes Java Reflection to run arbitrary java.lang.Math static methods. The exact API will depend on the version of Java you are running. The Java 8 API can be found here: link:https://docs.oracle.com/javase/8/docs/api/java/lang/Math.html[https://docs.oracle.com/javase/8/docs/api/java/lang/Math.html^]
+ +
In order to run the correct method, the parameter types must be correct. The Expression Language "Number" (whole number) type is interpreted as a Java "long". The "Decimal" type is interpreted as a Java "double". Running the desired method may require calling "toNumber()" or "toDecimal()" in order to "cast" the value to the desired type. This also is important to remember when cascading "math()" calls since the return type depends on the method that was run.# In order to run the correct method, the parameter types must be correct. The Expression Language "Number" (whole number) type is interpreted as a Java "long". The "Decimal" type is interpreted as a Java "double". Running the desired method may require calling "toNumber()" or "toDecimal()" in order to "cast" the value to the desired type. This also is important to remember when cascading "math()" calls since the return type depends on the method that was run.#
@ -1751,7 +1752,7 @@ In order to run the correct method, the parameter types must be correct. The Exp
*Description*: [.description]#Formats a number as a date/time according to the format specified by the argument. The argument *Description*: [.description]#Formats a number as a date/time according to the format specified by the argument. The argument
must be a String that is a valid Java SimpleDateFormat format. The Subject is expected to be a Number that must be a String that is a valid Java SimpleDateFormat format. The Subject is expected to be a Number that
represents the number of milliseconds since Midnight GMT on January 1, 1970. The number will be evaluated using the local represents the number of milliseconds since Midnight GMT on January 1, 1970. The number will be evaluated using the local
time zone unless specified in the second optional argument.# time zone unless specified in the second optional argument.#
*Subject Type*: [.subject]#Number# *Subject Type*: [.subject]#Number#

View File

@ -17,6 +17,7 @@
= Getting Started with Apache NiFi = Getting Started with Apache NiFi
Apache NiFi Team <dev@nifi.apache.org> Apache NiFi Team <dev@nifi.apache.org>
:homepage: http://nifi.apache.org :homepage: http://nifi.apache.org
:linkattrs:
== Who is This Guide For? == Who is This Guide For?
@ -57,7 +58,7 @@ dataflows.
== Downloading and Installing NiFi == Downloading and Installing NiFi
NiFi can be downloaded from the link:http://nifi.apache.org/download.html[NiFi Downloads Page]. There are two packaging options NiFi can be downloaded from the link:http://nifi.apache.org/download.html[NiFi Downloads Page^]. There are two packaging options
available: a "tarball" that is tailored more to Linux and a zip file that is more applicable for Windows users. Mac OS X users available: a "tarball" that is tailored more to Linux and a zip file that is more applicable for Windows users. Mac OS X users
may also use the tarball or can install via Homebrew. may also use the tarball or can install via Homebrew.
@ -111,7 +112,7 @@ and `sudo service nifi stop`. Additionally, the running status can be checked vi
== I Started NiFi. Now What? == I Started NiFi. Now What?
Now that NiFi has been started, we can bring up the User Interface (UI) in order to create and monitor our dataflow. Now that NiFi has been started, we can bring up the User Interface (UI) in order to create and monitor our dataflow.
To get started, open a web browser and navigate to `http://localhost:8080/nifi`. The port can be changed by To get started, open a web browser and navigate to link:http://localhost:8080/nifi[`http://localhost:8080/nifi`^]. The port can be changed by
editing the `nifi.properties` file in the NiFi `conf` directory, but the default port is 8080. editing the `nifi.properties` file in the NiFi `conf` directory, but the default port is 8080.
This will bring up the User Interface, which at this point is a blank canvas for orchestrating a dataflow: This will bring up the User Interface, which at this point is a blank canvas for orchestrating a dataflow:
@ -741,14 +742,14 @@ addition to this Getting Started Guide:
- link:developer-guide.html[Developer's Guide] - While not an exhaustive guide to All Things NiFi Development, this guide does provide a - link:developer-guide.html[Developer's Guide] - While not an exhaustive guide to All Things NiFi Development, this guide does provide a
comprehensive overview of the different API's available and how they should be used. In addition, it provides Best Practices for developing comprehensive overview of the different API's available and how they should be used. In addition, it provides Best Practices for developing
NiFi components and common Processor idioms to help aid in understanding the logic behind many of the existing NiFi components. NiFi components and common Processor idioms to help aid in understanding the logic behind many of the existing NiFi components.
- link:https://cwiki.apache.org/confluence/display/NIFI/Contributor+Guide[Contributor's Guide] - A guide for explaining how to contribute - link:https://cwiki.apache.org/confluence/display/NIFI/Contributor+Guide[Contributor's Guide^] - A guide for explaining how to contribute
work back to the Apache NiFi community so that others can make use of it. work back to the Apache NiFi community so that others can make use of it.
Several blog postings have also been added to the Apache NiFi blog site: Several blog postings have also been added to the Apache NiFi blog site:
link:https://blogs.apache.org/nifi/[https://blogs.apache.org/nifi/] link:https://blogs.apache.org/nifi/[https://blogs.apache.org/nifi/^]
In addition to the blog and guides provided here, you can browse the different In addition to the blog and guides provided here, you can browse the different
link:https://nifi.apache.org/mailing_lists.html[NiFi Mailing Lists] or send an e-mail to one of the mailing lists at link:https://nifi.apache.org/mailing_lists.html[NiFi Mailing Lists^] or send an e-mail to one of the mailing lists at
link:mailto:users@nifi.apache.org[users@nifi.apache.org] or link:mailto:users@nifi.apache.org[users@nifi.apache.org] or
link:mailto:dev@nifi.apache.org[dev@nifi.apache.org]. link:mailto:dev@nifi.apache.org[dev@nifi.apache.org].

View File

@ -17,6 +17,7 @@
= Apache NiFi In Depth = Apache NiFi In Depth
Apache NiFi Team <dev@nifi.apache.org> Apache NiFi Team <dev@nifi.apache.org>
:homepage: http://nifi.apache.org :homepage: http://nifi.apache.org
:linkattrs:
== Intro == Intro
This advanced level document is aimed at providing an in-depth look at the implementation and design decisions of NiFi. It assumes the reader has read enough of the other documentation to know the basics of NiFi. This advanced level document is aimed at providing an in-depth look at the implementation and design decisions of NiFi. It assumes the reader has read enough of the other documentation to know the basics of NiFi.
@ -49,7 +50,7 @@ NiFi protects against hardware and system failures by keeping a record of what w
If the Node was in the middle of writing content when it went down, nothing is corrupted, thanks to the Copy On Write (mentioned below) and Immutability (mentioned above) paradigms. Since FlowFile transactions never modify the original content (pointed to by the content pointer), the original is safe. When NiFi goes down, the write claim for the change is orphaned and then cleaned up by the background garbage collection. This provides a “rollback” to the last known stable state. If the Node was in the middle of writing content when it went down, nothing is corrupted, thanks to the Copy On Write (mentioned below) and Immutability (mentioned above) paradigms. Since FlowFile transactions never modify the original content (pointed to by the content pointer), the original is safe. When NiFi goes down, the write claim for the change is orphaned and then cleaned up by the background garbage collection. This provides a “rollback” to the last known stable state.
The Node then restores its state from the FlowFile. For a more in-depth, step-by-step explanation of the process, see this link: https://cwiki.apache.org/confluence/display/NIFI/NiFi%27s+Write-Ahead+Log+Implementation . The Node then restores its state from the FlowFile. For a more in-depth, step-by-step explanation of the process, see link:https://cwiki.apache.org/confluence/display/NIFI/NiFi%27s+Write-Ahead+Log+Implementation[NiFi's Write-Ahead Log Implementation^].
This setup, in terms of transactional units of work, allows NiFi to be very resilient in the face of adversity, ensuring that even if NiFi is suddenly killed, it can pick back up without any loss of data. This setup, in terms of transactional units of work, allows NiFi to be very resilient in the face of adversity, ensuring that even if NiFi is suddenly killed, it can pick back up without any loss of data.
@ -84,7 +85,7 @@ Keep in mind, though, that since Provenance is not copying the content in the Co
NOTE: Since provenance events are snapshots of the FlowFile, as it exists in the current flow, changes to the flow may impact the ability to replay provenance events later on. For example, if a Connection is deleted from the flow, the data cannot be replayed from that point in the flow, since there is now nowhere to enqueue the data for processing. NOTE: Since provenance events are snapshots of the FlowFile, as it exists in the current flow, changes to the flow may impact the ability to replay provenance events later on. For example, if a Connection is deleted from the flow, the data cannot be replayed from that point in the flow, since there is now nowhere to enqueue the data for processing.
For a look at the design decisions behind the Provenance Repository check out this link: https://cwiki.apache.org/confluence/display/NIFI/Persistent+Provenance+Repository+Design For a look at the design decisions behind the Provenance Repository check out link:https://cwiki.apache.org/confluence/display/NIFI/Persistent+Provenance+Repository+Design[Persistent Provenance Repository Design^].
==== Deeper View: Provenance Log Files ==== Deeper View: Provenance Log Files
Each provenance event has two maps, one for the attributes before the event and one for the updated attribute values. In general, provenance events don't store the updated values of the attributes as they existed when the event was emitted, but instead, the attribute values when the session is committed. The events are cached and saved until the session is committed and once the session is committed the events are emitted with the attributes associated with the FlowFile when the session is committed. The exception to this rule is the "SEND" event, in which case the event contains the attributes as they existed when the event was emitted. This is done because if the attributes themselves were also sent, it is important to have an accurate account of exactly what information was sent. Each provenance event has two maps, one for the attributes before the event and one for the updated attribute values. In general, provenance events don't store the updated values of the attributes as they existed when the event was emitted, but instead, the attribute values when the session is committed. The events are cached and saved until the session is committed and once the session is committed the events are emitted with the attributes associated with the FlowFile when the session is committed. The exception to this rule is the "SEND" event, in which case the event contains the attributes as they existed when the event was emitted. This is done because if the attributes themselves were also sent, it is important to have an accurate account of exactly what information was sent.
@ -106,7 +107,7 @@ It is considered a best practice to analyze the contents of a FlowFile as few ti
== Life of a FlowFile == Life of a FlowFile
To better understand how the repos interact with one another, the underlying functionality of NiFi, and the life of a FlowFile; this next section will include examples of a FlowFile at different points in a real flow. The flow is a template called "WebCrawler.xml" and is available here: https://cwiki.apache.org/confluence/display/NIFI/Example+Dataflow+Templates. To better understand how the repos interact with one another, the underlying functionality of NiFi, and the life of a FlowFile; this next section will include examples of a FlowFile at different points in a real flow. The flow is a template called "WebCrawler.xml" and is available here: link:https://cwiki.apache.org/confluence/display/NIFI/Example+Dataflow+Templates[https://cwiki.apache.org/confluence/display/NIFI/Example+Dataflow+Templates^].
At a high level, this template reaches out to a seed URL configured in the GetHTTP processor, then analyzes the response using the RouteText processor to find instances of a keyword (in this case "nifi"), and potential URLs to hit. Then InvokeHTTP executes a HTTP Get request using the URLs found in the original seed web page. The response is routed based on the status code attribute and only 200-202 status codes are routed back to the original RouteText processor for analysis. At a high level, this template reaches out to a seed URL configured in the GetHTTP processor, then analyzes the response using the RouteText processor to find instances of a keyword (in this case "nifi"), and potential URLs to hit. Then InvokeHTTP executes a HTTP Get request using the URLs found in the original seed web page. The response is routed based on the status code attribute and only 200-202 status codes are routed back to the original RouteText processor for analysis.

View File

@ -17,6 +17,7 @@
= Apache NiFi Overview = Apache NiFi Overview
Apache NiFi Team <dev@nifi.apache.org> Apache NiFi Team <dev@nifi.apache.org>
:homepage: http://nifi.apache.org :homepage: http://nifi.apache.org
:linkattrs:
== What is Apache NiFi? == What is Apache NiFi?
Put simply NiFi was built to automate the flow of data between systems. While Put simply NiFi was built to automate the flow of data between systems. While
@ -272,18 +273,19 @@ Flexible Scaling Model::
help. Use of NiFi's 'site-to-site' feature is also very effective as it is a protocol that allows NiFi and a client (including another NiFi cluster) to talk to each other, share information help. Use of NiFi's 'site-to-site' feature is also very effective as it is a protocol that allows NiFi and a client (including another NiFi cluster) to talk to each other, share information
about loading, and to exchange data on specific authorized ports. about loading, and to exchange data on specific authorized ports.
Scale-up & down;; Scale-up & down;;
NiFi is also designed to scale-up and down in a very flexible manner. In terms of increasing throughput from the standpoint of the NiFi framework, it is possible to increase the number of concurrent tasks on the processor under the Scheduling tab when configuring. This allows more processes to execute simultaneously, providing greater throughput. On the other side of the spectrum, you can perfectly scale NiFi down to be suitable to run on edge devices where a small footprint is desired due to limited hardware resources. To specifically solve the first mile data collection challenge and edge use cases, you can find more details here: https://cwiki.apache.org/confluence/display/NIFI/MiNiFi regarding a child project effort of Apache NiFi, MiNiFi (pronounced "minify", [min-uh-fahy]). NiFi is also designed to scale-up and down in a very flexible manner. In terms of increasing throughput from the standpoint of the NiFi framework, it is possible to increase the number of concurrent tasks on the processor under the Scheduling tab when configuring. This allows more processes to execute simultaneously, providing greater throughput. On the other side of the spectrum, you can perfectly scale NiFi down to be suitable to run on edge devices where a small footprint is desired due to limited hardware resources. To specifically solve the first mile data collection challenge and edge use cases, you can find more details here: link:https://cwiki.apache.org/confluence/display/NIFI/MiNiFi[https://cwiki.apache.org/confluence/display/NIFI/MiNiFi^] regarding a child project effort of Apache NiFi, MiNiFi (pronounced "minify", [min-uh-fahy]).
== References == References
[bibliography] [bibliography]
- [[[eip]]] Gregor Hohpe. Enterprise Integration Patterns [online]. Retrieved: 27 Dec 2014, from: http://www.enterpriseintegrationpatterns.com/ - [[[eip]]] Gregor Hohpe. Enterprise Integration Patterns [online]. Retrieved: 27 Dec 2014, from: link:http://www.enterpriseintegrationpatterns.com[http://www.enterpriseintegrationpatterns.com^]
- [[[soa]]] Wikipedia. Service Oriented Architecture [online]. Retrieved: 27 Dec 2014, from: http://en.wikipedia.org/wiki/Service-oriented_architecture - [[[soa]]] Wikipedia. Service Oriented Architecture [online]. Retrieved: 27 Dec 2014, from: link:http://en.wikipedia.org/wiki/Service-oriented_architecture[http://en.wikipedia.org/wiki/Service-oriented_architecture^]
- [[[api]]] Eric Savitz. Welcome to the API Economy [online]. Forbes.com. Retrieved: 27 Dec 2014, from: http://www.forbes.com/sites/ciocentral/2012/08/29/welcome-to-the-api-economy/ - [[[api]]] Eric Savitz. Welcome to the API Economy [online]. Forbes.com. Retrieved: 27 Dec 2014, from: link:http://www.forbes.com/sites/ciocentral/2012/08/29/welcome-to-the-api-economy[http://www.forbes.com/sites/ciocentral/2012/08/29/welcome-to-the-api-economy^]
- [[[api2]]] Adam Duvander. The rise of the API economy and consumer-led ecosystems [online]. thenextweb.com. Retrieved: 27 Dec 2014, from: http://thenextweb.com/dd/2014/03/28/api-economy/ - [[[api2]]] Adam Duvander. The rise of the API economy and consumer-led ecosystems [online]. thenextweb.com. Retrieved: 27 Dec 2014, from: link:http://thenextweb.com/dd/2014/03/28/api-economy[http://thenextweb.com/dd/2014/03/28/api-economy^]
- [[[iot]]] Wikipedia. Internet of Things [online]. Retrieved: 27 Dec 2014, from: http://en.wikipedia.org/wiki/Internet_of_Things - [[[iot]]] Wikipedia. Internet of Things [online]. Retrieved: 27 Dec 2014, from: link:http://en.wikipedia.org/wiki/Internet_of_Things[http://en.wikipedia.org/wiki/Internet_of_Things^]
- [[[bigdata]]] Wikipedia. Big Data [online]. Retrieved: 27 Dec 2014, from: http://en.wikipedia.org/wiki/Big_data - [[[bigdata]]] Wikipedia. Big Data [online]. Retrieved: 27 Dec 2014, from: link:http://en.wikipedia.org/wiki/Big_data[http://en.wikipedia.org/wiki/Big_data^]
- [[[fbp]]] Wikipedia. Flow Based Programming [online]. Retrieved: 28 Dec 2014, from: http://en.wikipedia.org/wiki/Flow-based_programming#Concepts - [[[fbp]]] Wikipedia. Flow Based Programming [online]. Retrieved: 28 Dec 2014, from: link:http://en.wikipedia.org/wiki/Flow-based_programming#Concepts[http://en.wikipedia.org/wiki/Flow-based_programming#Concepts^]
- [[[seda]]] Matt Welsh. Harvard. SEDA: An Architecture for Highly Concurrent Server Applications [online]. Retrieved: 28 Dec 2014, from: http://www.eecs.harvard.edu/~mdw/proj/seda/ - [[[seda]]] Matt Welsh. Berkeley. SEDA: An Architecture for Well-Conditioned,
Scalable Internet Services [online]. Retrieved: 18 Jan 2018, from: link:http://www.mdw.la/papers/seda-sosp01.pdf[http://www.mdw.la/papers/seda-sosp01.pdf^]

View File

@ -17,7 +17,7 @@
= Apache NiFi User Guide = Apache NiFi User Guide
Apache NiFi Team <dev@nifi.apache.org> Apache NiFi Team <dev@nifi.apache.org>
:homepage: http://nifi.apache.org :homepage: http://nifi.apache.org
:linkattrs:
== Introduction == Introduction
Apache NiFi is a dataflow system based on the concepts of flow-based programming. It supports Apache NiFi is a dataflow system based on the concepts of flow-based programming. It supports
@ -147,7 +147,7 @@ When a DFM navigates to the UI for the first time, a blank canvas is provided on
image::nifi-toolbar-components.png["NiFi Components Toolbar"] image::nifi-toolbar-components.png["NiFi Components Toolbar"]
The Components Toolbar runs across the top left portion of your screen. It consists of the components you can drag onto the The Components Toolbar runs across the top left portion of your screen. It consists of the components you can drag onto the
canvas to build your dataflow. Each component is described in more detail in link:building-dataflow.html[Building a Dataflow]. canvas to build your dataflow. Each component is described in more detail in <<building-dataflow>>.
The Status Bar is under the Components Toolbar. The Status bar provides information about the number of threads that are The Status Bar is under the Components Toolbar. The Status bar provides information about the number of threads that are
currently active in the flow, the amount of data that currently exists in the flow, how many Remote Process Groups exist currently active in the flow, the amount of data that currently exists in the flow, how many Remote Process Groups exist
@ -628,7 +628,7 @@ For example:
* The string `0 20 14 ? * MON-FRI` indicates that you want to schedule the processor to run at 2:20 PM every Monday through Friday. * The string `0 20 14 ? * MON-FRI` indicates that you want to schedule the processor to run at 2:20 PM every Monday through Friday.
* The string `0 15 10 ? * 6L 2011-2017` indicates that you want to schedule the processor to run at 10:15 AM, on the last Friday of every month, between 2011 and 2017. * The string `0 15 10 ? * 6L 2011-2017` indicates that you want to schedule the processor to run at 10:15 AM, on the last Friday of every month, between 2011 and 2017.
For additional information and examples, see the http://www.quartz-scheduler.org/documentation/quartz-2.x/tutorials/crontrigger.html[Chron Trigger Tutorial] in the Quartz documentation. For additional information and examples, see the link:http://www.quartz-scheduler.org/documentation/quartz-2.x/tutorials/crontrigger.html[Chron Trigger Tutorial^] in the Quartz documentation.
Next, the Scheduling Tab provides a configuration option named `Concurrent tasks`. This controls how many threads the Processor Next, the Scheduling Tab provides a configuration option named `Concurrent tasks`. This controls how many threads the Processor
will use. Said a different way, this controls how many FlowFiles should be processed by this Processor at the same time. Increasing will use. Said a different way, this controls how many FlowFiles should be processed by this Processor at the same time. Increasing
@ -1761,7 +1761,7 @@ will remove the selection.
[[versioning_dataflow]] [[versioning_dataflow]]
== Versioning a DataFlow == Versioning a DataFlow
When NiFi is connected to a NiFi Registry, dataflows can be version controlled on the process group level. For more information about NiFi Registry usage and configuration, see the documentation at https://nifi.apache.org/docs/nifi-registry-docs/index.html. When NiFi is connected to a NiFi Registry, dataflows can be version controlled on the process group level. For more information about NiFi Registry usage and configuration, see the documentation at link:https://nifi.apache.org/docs/nifi-registry-docs/index.html[https://nifi.apache.org/docs/nifi-registry-docs/index.html^].
=== Connecting to a NiFi Registry === Connecting to a NiFi Registry
To connect NiFi to a Registry, select Controller Settings from the Global Menu. To connect NiFi to a Registry, select Controller Settings from the Global Menu.
@ -1780,7 +1780,7 @@ Click "Add" to complete the registration.
image::registry-client-added.png["Registry Client Added"] image::registry-client-added.png["Registry Client Added"]
NOTE: Versioned flows are stored and organized in registry buckets. Bucket Policies and Special Privileges configured by the registry administrator determine which buckets a user can import versioned flows from and which buckets a user can save versioned flows to. Information on Bucket Policies and Special Privileges can be found in the NiFi Registry User Guide (https://nifi.apache.org/docs/nifi-registry-docs/html/user-guide.html). NOTE: Versioned flows are stored and organized in registry buckets. Bucket Policies and Special Privileges configured by the registry administrator determine which buckets a user can import versioned flows from and which buckets a user can save versioned flows to. Information on Bucket Policies and Special Privileges can be found in the NiFi Registry User Guide (link:https://nifi.apache.org/docs/nifi-registry-docs/html/user-guide.html[https://nifi.apache.org/docs/nifi-registry-docs/html/user-guide.html^]).
[[version_states]] [[version_states]]
=== Version States === Version States
@ -2005,7 +2005,7 @@ filter the templates to see only those of interest, export, and delete Templates
[[Export_Template]] [[Export_Template]]
==== Exporting a Template ==== Exporting a Template
Once a Template has been created, it can be shared with others in the Template Management page (see <<Manage_Templates>>). Once a Template has been created, it can be shared with others in the Template Management page.
To export a Template, locate the Template in the table. The Filter in the top-right corner To export a Template, locate the Template in the table. The Filter in the top-right corner
can be used to help find the appropriate Template if several are available. Then click the Export or Download button ( can be used to help find the appropriate Template if several are available. Then click the Export or Download button (
image:iconExport.png["Export"] image:iconExport.png["Export"]
@ -2015,8 +2015,7 @@ into other instances of NiFi (see <<Import_Template>>).
==== Removing a Template ==== Removing a Template
Once it is decided that a Template is no longer needed, it can be easily removed from the Template Management page Once it is decided that a Template is no longer needed, it can be easily removed from the Template Management page. To delete a Template, locate it in the table (the Filter in the top-right corner
(see <<Manage_Templates>>). To delete a Template, locate it in the table (the Filter in the top-right corner
may be used to find the appropriate Template if several are available) and click the Delete button ( may be used to find the appropriate Template if several are available) and click the Delete button (
image:iconDelete.png["Delete"] image:iconDelete.png["Delete"]
). This will prompt for confirmation. After confirming the deletion, the Template will be removed from this table ). This will prompt for confirmation. After confirming the deletion, the Template will be removed from this table
@ -2247,10 +2246,10 @@ When switching between implementation "families" (i.e. `VolatileProvenanceReposi
============ ============
* Switching between unencrypted and encrypted repositories * Switching between unencrypted and encrypted repositories
** If a user has an existing repository (`WriteAheadProvenanceRepository` only -- **not** `PersistentProvenanceRepository`) that is not encrypted and switches their configuration to use an encrypted repository, the application writes an error to the log but starts up. However, previous events are not accessible through the provenance query interface and new events will overwrite the existing events. The same behavior occurs if a user switches from an encrypted repository to an unencrypted repository. Automatic roll-over is a future effort (https://issues.apache.org/jira/browse/NIFI-3722[NIFI-3722]) but NiFi is not intended for long-term storage of provenance events so the impact should be minimal. There are two scenarios for roll-over: ** If a user has an existing repository (`WriteAheadProvenanceRepository` only -- **not** `PersistentProvenanceRepository`) that is not encrypted and switches their configuration to use an encrypted repository, the application writes an error to the log but starts up. However, previous events are not accessible through the provenance query interface and new events will overwrite the existing events. The same behavior occurs if a user switches from an encrypted repository to an unencrypted repository. Automatic roll-over is a future effort (link:https://issues.apache.org/jira/browse/NIFI-3722[NIFI-3722^]) but NiFi is not intended for long-term storage of provenance events so the impact should be minimal. There are two scenarios for roll-over:
*** Encrypted -> unencrypted -- if the previous repository implementation was encrypted, these events should be handled seamlessly as long as the key provider available still has the keys used to encrypt the events (see **Key Rotation**) *** Encrypted -> unencrypted -- if the previous repository implementation was encrypted, these events should be handled seamlessly as long as the key provider available still has the keys used to encrypt the events (see **Key Rotation**)
*** Unencrypted -> encrypted -- if the previous repository implementation was unencrypted, these events should be handled seamlessly as the previously recorded events simply need to be read with a plaintext schema record reader and then written back with the encrypted record writer *** Unencrypted -> encrypted -- if the previous repository implementation was unencrypted, these events should be handled seamlessly as the previously recorded events simply need to be read with a plaintext schema record reader and then written back with the encrypted record writer
** There is also a future effort to provide a standalone tool in NiFi Toolkit to encrypt/decrypt an existing provenance repository to make the transition easier. The translation process could take a long time depending on the size of the existing repository, and being able to perform this task outside of application startup would be valuable (https://issues.apache.org/jira/browse/NIFI-3723[NIFI-3723]). ** There is also a future effort to provide a standalone tool in NiFi Toolkit to encrypt/decrypt an existing provenance repository to make the transition easier. The translation process could take a long time depending on the size of the existing repository, and being able to perform this task outside of application startup would be valuable (link:https://issues.apache.org/jira/browse/NIFI-3723[NIFI-3723^]).
* Multiple repositories -- No additional effort or testing has been applied to multiple repositories at this time. It is possible/likely issues will occur with repositories on different physical devices. There is no option to provide a heterogenous environment (i.e. one encrypted, one plaintext repository). * Multiple repositories -- No additional effort or testing has been applied to multiple repositories at this time. It is possible/likely issues will occur with repositories on different physical devices. There is no option to provide a heterogenous environment (i.e. one encrypted, one plaintext repository).
* Corruption -- when a disk is filled or corrupted, there have been reported issues with the repository becoming corrupted and recovery steps are necessary. This is likely to continue to be an issue with the encrypted repository, although still limited in scope to individual records (i.e. an entire repository file won't be irrecoverable due to the encryption). * Corruption -- when a disk is filled or corrupted, there have been reported issues with the repository becoming corrupted and recovery steps are necessary. This is likely to continue to be an issue with the encrypted repository, although still limited in scope to individual records (i.e. an entire repository file won't be irrecoverable due to the encryption).