Changed JettyServer default SSL initialization and updated unit test. Removed SecurityStoreTypes (unused). Added StringUtils inverted blank and empty checks. Added TlsConfiguration container object. Enhanced KeystoreType enum. Added clean #createSSLContext() method to serve as base method for special cases/other method signatures. Added utility methods in KeyStoreUtils. Added generic TlsException for callers that cannot resolve TLS-specific exceptions. Added utility methods for component object debugging. Enforced TLS protocol version on cluster comms socket creation. Added utility method for SSL server socket creation. Refactored (Server)SocketConfigurationFactoryBean to store relevant NiFiProperties in TlsConfiguration instead of stateful SSLContextFactory (Cluster comms now enforce modern TLS protocol version). Removed duplicate SSLContextFactory. Switched duplicate SslContextFactory to wrap shared SSLContextFactory. Refactored SslContextFactoryTest for clarity (will move any unique tests to nifi-security-utils class test). Added further validation & boundary checking in uses of TlsConfiguration. Provided SSLSocketFactory accessor in SslContextFactory. Refactored OkHttpReplicationClient tuple method. Refactored OcspCertificateValidator TLS logic. Added utility method to apply TLS configs to OkHttpClientBuilder. Removed references to duplicate SslContextFactory. Removed unnecessary SslContextFactory. Moved OkHttpClientUtils to nifi-web-util module. Updated module dependencies. Removed now empty nifi-security module. Enforced TLS protocol selection on LB server socket. Enforced TLS protocol selection on S2S server socket. Applied specified TLS protocol versions to S2S socket creation. Completed removal of legacy SSLContext creation methods from only remaining SslContextFactory. Replaced references to creation methods throughout codebase. Replaced references to unnecessary NiFiProperties file reads throughout tests. Removed duplicate ClientAuth enum from SSLContextService and changed all references to SslContextFactory.ClientAuth. Suppressed repeated TLS exceptions in cluster, S2S, and load balance socket listeners. Cleaned up legacy code. Added external timing check to timing test assertion. Made RestrictedSSLContextService TLS protocol versions allowable values explicit. Enabled TLSv1.3 on Java 11. Added explanations of TLS protocol versions in StandardSSLContextService and StandardRestrictedSSLContextService. Resolved additional Java 11 test failures for NiFi internal classes that don't support TLSv1.3. Filed NIFI-7468 as follow on task. This closes #4263. Signed-off-by: Nathan Gough <thenatog@gmail.com> Signed-off-by: Mark Payne <markap14@hotmail.com>
Stateless NiFi
Similar to other stream processing frameworks, receipt of incoming data is not acknowledged until it is written to a destination. In the event of failure, data can be replayed from the source rather than relying on a stateful content repository. This will not work for all cases (e.g. fire-and-forget HTTP/tcp), but a large portion of use cases have a resilient source to retry from.
Note: Provenance, metrics, logs are not extracted at this time. Docker and other container engines can be used for logs and metrics.
Build:
mvn package -P docker
Docker image will be tagged apache/nifi-stateless:1.10.0-SNAPSHOT-dockermaven
Usage:
After building, the image can be used as follows:
docker run <options> apache/nifi-stateless:1.10.0-SNAPSHOT-dockermaven <arguments>
Stateless NiFi flows can also be run using nifi.sh
./bin/nifi.sh stateless <arguments>
The dictate the runtime to use:
1) RunFromRegistry [Once|Continuous] --json <JSON>
RunFromRegistry [Once|Continuous] --file <File Name> # Filename of JSON file that matches the examples below.
2) RunYARNServiceFromRegistry <YARN RM URL> <Docker Image Name> <Service Name> <# of Containers> --json <JSON>
RunYARNServiceFromRegistry <YARN RM URL> <Docker Image Name> <Service Name> <# of Containers> --file <File Name>
3) RunOpenwhiskActionServer <Port>
Examples:
1) ${NIFI_HOME}/bin/nifi.sh stateless RunFromRegistry Once --file /Users/nifi/nifi-stateless-configs/flow-abc.json
2) docker run --rm -it apache/nifi-stateless:1.10.0-SNAPSHOT-dockermaven \
RunFromRegistry Once --json "`cat /Users/nifi/nifi-stateless-configs/flow-abc.json`"
3) docker run --rm -it -v /Users/nifi/nifi-stateless-configs/kafka-to-solr.json:/home/nifi/flow.json apache/nifi-stateless:1.10.0-SNAPSHOT-dockermaven \
RunYARNServiceFromRegistry http://127.0.0.1:8088 apache/nifi-stateless:latest kafka-to-solr 3 --file /home/nifi/flow.json
4) docker run -d apache/nifi-stateless:1.10.0-SNAPSHOT-dockermaven \
RunOpenwhiskActionServer 8080
Notes:
1) The configuration file must be in JSON format.
2) When providing configurations via JSON, the following attributes must be provided: nifi_registry, nifi_bucket, nifi_flow.
3) When running in docker, the configuration can either be provided as a string or by localizing the file into the docker container such as through the "-v" option.
JSON Format
The JSON that is provided, either via the --json
command-line argument or the --file
command-line argument has the following elements:
registryUrl
: The URL of the NiFi Registry that should be used for pulling the FlowbucketId
: The UUID of the Bucket containing the flowflowId
: The UUID of the flow to runflowVersion
: Optional - The Version of the flow to run. If not present or equal to -1, then the latest version of the flow will be used.materializeContent
: Optional - Whether or not the contents of the FlowFile should be stored in Java Heap so that they can be read multiple times. If this value isfalse
, the contents of any input FlowFile will be read as a stream of data and not buffered into heap. However, this means that the contents can be read only one time. This can be useful if transferring large files from HDFS to another HDFS instance or directory, for example, and contains a simple flow such asListHDFS -> FetchHDFS -> PutHDFS
. In this flow, the contents of the files will be buffered into Java Heap if the value of this argument istrue
but will not be if the value of this argument isfalse
.failurePortIds
: Optional - An array of Port UUID's, such that if any data is sent to one of the ports with these ID's, the flow is considered "failed" and will stop immediately.ssl
: Optional - If present, provides SSL keystore and truststore information that can be used for interacting with the NiFi Registry and for Site-to-Site communications for Remote Process Groups.flowFiles
: Optional - An array of FlowFiles that should be provided to the flow's Input Port. Each element in the array is a JSON object. That JSON object can have multiple keys. If any of those keys isnifi_content
then the String value of that element will be the FlowFile's content. Otherwise, the key/value pair is considered an attribute of the FlowFile.parameters
: Optional - Key-value pairs (or objects if sensitive) that will be passed to the NiFi Flow as parameters.
Minimal JSON Sample:
{
"registryUrl": "http://localhost:18080",
"bucketId": "3aa885db-30c8-4c87-989c-d32b8ea1d3d8",
"flowId": "0d219eb8-419b-42ba-a5ee-ce07445c6fc5"
}
Full JSON Sample:
{
"registryUrl": "https://localhost:9443",
"bucketId": "3aa885db-30c8-4c87-989c-d32b8ea1d3d8",
"flowId": "0d219eb8-419b-42ba-a5ee-ce07445c6fc5",
"flowVersion": 8,
"materializeContent":true,
"failurePortIds": ["f25c9204-6c95-3aa9-b0a8-c556f5f61849"],
"ssl": {
"keystoreFile": "/etc/security/keystore.jks",
"keystorePass": "apachenifi",
"keyPass": "nifiapache",
"keystoreType": "JKS",
"truststoreFile": "/etc/security/truststore.jks",
"truststorePass": "apachenifi",
"truststoreType": "JKS"
},
"flowFiles":[{
"absolute.path": "/tmp/nifistateless/input/",
"filename": "test.txt",
"nifi_content": "hello"
},
{
"absolute.path": "/tmp/nifistateless/input/",
"filename": "test2.txt",
"nifi_content": "hi"
}],
"parameters": {
"DestinationDirectory" : "/tmp/nifistateless/output2/",
"Username" : "jdoe",
"Password": { "sensitive": "true", "value": "password" }
}
}
TODO:
- Provenance is always recorded instead of waiting for commit. Rollback could result in duplicates: -StatelessProvenanceReporter.send force option is not appreciated -StatelessProcessSession.adjustCounter immediate is not appreciated
- Send logs, metrics, and provenance to kafka/solr (configure a flow ID for each?)
- Counters
- Tests
- Processor and port IDs from the UI do not match IDs in templates or the registry