mirror of
https://github.com/apache/nifi.git
synced 2025-03-01 15:09:11 +00:00
NIFI-13527 Corrected various grammar issues in comments and docs
This closes #9060 Signed-off-by: David Handermann <exceptionfactory@apache.org>
This commit is contained in:
parent
f92d8f72a8
commit
3f6eb6ff63
@ -49,7 +49,7 @@
|
|||||||
Configuring PublishAMQP:
|
Configuring PublishAMQP:
|
||||||
</p>
|
</p>
|
||||||
<ol>
|
<ol>
|
||||||
<li><b>Queue</b> - [REQUIRED] the name of AMQP queue the messages will retrieved from. Usually provided by administrator (e.g., 'amq.direct')
|
<li><b>Queue</b> - [REQUIRED] the name of AMQP queue the messages will be retrieved from. Usually provided by administrator (e.g., 'amq.direct')
|
||||||
</li>
|
</li>
|
||||||
<li><b>Host Name</b> - [REQUIRED] the name of the host where AMQP broker is running.
|
<li><b>Host Name</b> - [REQUIRED] the name of the host where AMQP broker is running.
|
||||||
Usually provided by administrator (e.g., 'myhost.com'). Defaults to 'localhost'.
|
Usually provided by administrator (e.g., 'myhost.com'). Defaults to 'localhost'.
|
||||||
@ -63,7 +63,7 @@
|
|||||||
<li><b>Password</b> - [REQUIRED] password to use with user name to connect to AMQP broker.
|
<li><b>Password</b> - [REQUIRED] password to use with user name to connect to AMQP broker.
|
||||||
Usually provided by the administrator. Defaults to 'guest'.
|
Usually provided by the administrator. Defaults to 'guest'.
|
||||||
</li>
|
</li>
|
||||||
<li><b>Use Certificate Authentication</b> - [OPTIONAL] whether or not to use the SSL certificate common name for authentication rather than user name/password.
|
<li><b>Use Certificate Authentication</b> - [OPTIONAL] Use the SSL certificate common name for authentication rather than user name/password.
|
||||||
This can only be used in conjunction with SSL. Defaults to 'false'.
|
This can only be used in conjunction with SSL. Defaults to 'false'.
|
||||||
</li>
|
</li>
|
||||||
<li><b>Virtual Host</b> - [OPTIONAL] Virtual Host name which segregates AMQP system for enhanced security.
|
<li><b>Virtual Host</b> - [OPTIONAL] Virtual Host name which segregates AMQP system for enhanced security.
|
||||||
|
@ -84,7 +84,7 @@
|
|||||||
<li><b>Password</b> - [REQUIRED] password to use with user name to connect to AMQP broker.
|
<li><b>Password</b> - [REQUIRED] password to use with user name to connect to AMQP broker.
|
||||||
Usually provided by the administrator. Defaults to 'guest'.
|
Usually provided by the administrator. Defaults to 'guest'.
|
||||||
</li>
|
</li>
|
||||||
<li><b>Use Certificate Authentication</b> - [OPTIONAL] whether or not to use the SSL certificate common name for authentication rather than user name/password.
|
<li><b>Use Certificate Authentication</b> - [OPTIONAL] Use the SSL certificate common name for authentication rather than user name/password.
|
||||||
This can only be used in conjunction with SSL. Defaults to 'false'.
|
This can only be used in conjunction with SSL. Defaults to 'false'.
|
||||||
</li>
|
</li>
|
||||||
<li><b>Virtual Host</b> - [OPTIONAL] Virtual Host name which segregates AMQP system for enhanced security.
|
<li><b>Virtual Host</b> - [OPTIONAL] Virtual Host name which segregates AMQP system for enhanced security.
|
||||||
|
@ -32,7 +32,7 @@
|
|||||||
|
|
||||||
<h3>Description</h3>
|
<h3>Description</h3>
|
||||||
<p>
|
<p>
|
||||||
This processor collects various objects (eg. tasks, comments, etc...) from Asana via the specified
|
This processor collects various objects (e.g. tasks, comments, etc...) from Asana via the specified
|
||||||
<code>AsanaClientService</code>. When the processor started for the first time with a given configuration
|
<code>AsanaClientService</code>. When the processor started for the first time with a given configuration
|
||||||
it collects each of the objects matching the user specified criteria, and emits <code>FlowFile</code>s
|
it collects each of the objects matching the user specified criteria, and emits <code>FlowFile</code>s
|
||||||
of each on the <code>NEW</code> relationship. Then, it polls Asana in the frequency of the configured <em>Run Schedule</em>
|
of each on the <code>NEW</code> relationship. Then, it polls Asana in the frequency of the configured <em>Run Schedule</em>
|
||||||
|
@ -58,7 +58,7 @@
|
|||||||
It usually guesses the name of this class correctly from <i>Root Model Name</i>.<br/>
|
It usually guesses the name of this class correctly from <i>Root Model Name</i>.<br/>
|
||||||
However there may be situations where this is not the case.
|
However there may be situations where this is not the case.
|
||||||
Should this happen, one can take use of the fact that NiFi logs the temporary directory where the compiled Java classes can be found.
|
Should this happen, one can take use of the fact that NiFi logs the temporary directory where the compiled Java classes can be found.
|
||||||
Once the proper class of the root model type is identified in that directory (should be easily done by looking for it by it's name)
|
Once the proper class of the root model type is identified in that directory (should be easily done by looking for it by its name)
|
||||||
it can be provided directly via the <i>Root Model Class Name</i> property.
|
it can be provided directly via the <i>Root Model Class Name</i> property.
|
||||||
(Note however that the service should be left <i>Enabled</i> while doing the search as it deletes the temporary directory when it is disabled.
|
(Note however that the service should be left <i>Enabled</i> while doing the search as it deletes the temporary directory when it is disabled.
|
||||||
To be able to set the property the service needs to be disabled in the end - and let it remove the directory,
|
To be able to set the property the service needs to be disabled in the end - and let it remove the directory,
|
||||||
|
@ -31,7 +31,7 @@
|
|||||||
<h2>Data types</h2>
|
<h2>Data types</h2>
|
||||||
|
|
||||||
<p>
|
<p>
|
||||||
The list data types supported by DynamoDB does not fully overlaps with the capabilities of the Record data structure.
|
The list data types supported by DynamoDB does not fully overlap with the capabilities of the Record data structure.
|
||||||
Some conversions and simplifications are necessary during inserting the data. These are:
|
Some conversions and simplifications are necessary during inserting the data. These are:
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
@ -49,7 +49,7 @@
|
|||||||
|
|
||||||
<p>
|
<p>
|
||||||
Working with DynamoDB when batch inserting comes with two inherit limitations. First, the number of inserted Items is limited to 25 in any case.
|
Working with DynamoDB when batch inserting comes with two inherit limitations. First, the number of inserted Items is limited to 25 in any case.
|
||||||
In order to overcome this, during one execution, depending on the number or records in the incoming FlowFile, <i>PutDynamoDBRecord</i> might attempt multiple
|
In order to overcome this, during one execution, depending on the number of records in the incoming FlowFile, <i>PutDynamoDBRecord</i> might attempt multiple
|
||||||
insert calls towards the database server. Using this approach, the flow does not have to work with this limitation in most cases.
|
insert calls towards the database server. Using this approach, the flow does not have to work with this limitation in most cases.
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
@ -63,7 +63,7 @@
|
|||||||
|
|
||||||
<p>
|
<p>
|
||||||
The most common reason for this behaviour comes from the other limitation the inserts have with DynamoDB: the database has a build in supervision over the amount of inserted data.
|
The most common reason for this behaviour comes from the other limitation the inserts have with DynamoDB: the database has a build in supervision over the amount of inserted data.
|
||||||
When a client reaches the "throughput limit", the server refuses to process the insert request until a certain amount of time. More information on this might be find <a href="https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.ReadWriteCapacityMode.html">here</a>.
|
When a client reaches the "throughput limit", the server refuses to process the insert request until a certain amount of time. More information <a href="https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.ReadWriteCapacityMode.html">here</a>.
|
||||||
From the perspective of the <i>PutDynamoDBRecord</i> we consider these cases as temporary issues and the FlowFile will be transferred to the "unprocessed" Relationship after which the processor will yield in order to avoid further throughput issues.
|
From the perspective of the <i>PutDynamoDBRecord</i> we consider these cases as temporary issues and the FlowFile will be transferred to the "unprocessed" Relationship after which the processor will yield in order to avoid further throughput issues.
|
||||||
(Other kinds of failures will result transfer to the "failure" Relationship)
|
(Other kinds of failures will result transfer to the "failure" Relationship)
|
||||||
</p>
|
</p>
|
||||||
@ -87,13 +87,13 @@
|
|||||||
<h4>Partition By Field</h4>
|
<h4>Partition By Field</h4>
|
||||||
|
|
||||||
<p>
|
<p>
|
||||||
The processors assigns one of the record fields as partition key. The name of the record field is specified by the "Partition Key Field" property and the value will be the value of the record field with the same name.
|
The processors assign one of the record fields as partition key. The name of the record field is specified by the "Partition Key Field" property and the value will be the value of the record field with the same name.
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
<h4>Partition By Attribute</h4>
|
<h4>Partition By Attribute</h4>
|
||||||
|
|
||||||
<p>
|
<p>
|
||||||
The processor assigns the value of a FlowFile attribute as partition key. With this strategy all the Items within a FlowFile will share the same partition key value and it is suggested to use for tables also having a sort key in order to meet the primary key requirements of the DynamoDB.
|
The processor assigns the value of a FlowFile attribute as partition key. With this strategy all the Items within a FlowFile will share the same partition key value, and it is suggested to use for tables also having a sort key in order to meet the primary key requirements of the DynamoDB.
|
||||||
The property "Partition Key Field" defines the name of the Item field and the property "Partition Key Attribute" will specify which attribute's value will be assigned to the partition key.
|
The property "Partition Key Field" defines the name of the Item field and the property "Partition Key Attribute" will specify which attribute's value will be assigned to the partition key.
|
||||||
With this strategy the "Partition Key Field" must be different from the fields consisted by the incoming records.
|
With this strategy the "Partition Key Field" must be different from the fields consisted by the incoming records.
|
||||||
</p>
|
</p>
|
||||||
@ -118,7 +118,7 @@
|
|||||||
<h4>Sort By Field</h4>
|
<h4>Sort By Field</h4>
|
||||||
|
|
||||||
<p>
|
<p>
|
||||||
The processors assigns one of the record fields as sort key. The name of the record field is specified by the "Sort Key Field" property and the value will be the value of the record field with the same name.
|
The processors assign one of the record fields as sort key. The name of the record field is specified by the "Sort Key Field" property and the value will be the value of the record field with the same name.
|
||||||
With this strategy the "Sort Key Field" must be different from the fields consisted by the incoming records.
|
With this strategy the "Sort Key Field" must be different from the fields consisted by the incoming records.
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
|
@ -25,11 +25,6 @@
|
|||||||
<h1>Amazon Transcribe</h1>
|
<h1>Amazon Transcribe</h1>
|
||||||
<p>
|
<p>
|
||||||
Automatically convert speech to text
|
Automatically convert speech to text
|
||||||
<ul>
|
|
||||||
<li>Extract key business insights from customer calls, video files, clinical conversations, and more.</li>
|
|
||||||
<li>Improve business outcomes with state of the art speech recognition models that are fully managed and continuously trained.</li>
|
|
||||||
<li>Ensure customer privacy and safety by masking sensitive information.</li>
|
|
||||||
</ul>
|
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
<h3>Usage</h3>
|
<h3>Usage</h3>
|
||||||
|
@ -25,11 +25,6 @@
|
|||||||
<h1>Amazon Transcribe</h1>
|
<h1>Amazon Transcribe</h1>
|
||||||
<p>
|
<p>
|
||||||
Automatically convert speech to text
|
Automatically convert speech to text
|
||||||
<ul>
|
|
||||||
<li>Extract key business insights from customer calls, video files, clinical conversations, and more.</li>
|
|
||||||
<li>Improve business outcomes with state of the art speech recognition models that are fully managed and continuously trained.</li>
|
|
||||||
<li>Ensure customer privacy and safety by masking sensitive information.</li>
|
|
||||||
</ul>
|
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
<h3>Usage</h3>
|
<h3>Usage</h3>
|
||||||
|
@ -26,7 +26,7 @@
|
|||||||
<p>
|
<p>
|
||||||
Amazon Translate is a neural machine translation service for translating text to and from English across a breadth of supported languages.
|
Amazon Translate is a neural machine translation service for translating text to and from English across a breadth of supported languages.
|
||||||
Powered by deep-learning technologies, Amazon Translate delivers fast, high-quality, and affordable language translation.
|
Powered by deep-learning technologies, Amazon Translate delivers fast, high-quality, and affordable language translation.
|
||||||
It provides a managed, continually trained solution so you can easily translate company and user-authored content or build applications that require support across multiple languages.
|
It provides a managed, continually trained solution, so you can easily translate company and user-authored content or build applications that require support across multiple languages.
|
||||||
The machine translation engine has been trained on a wide variety of content across different domains to produce quality translations that serve any industry need.
|
The machine translation engine has been trained on a wide variety of content across different domains to produce quality translations that serve any industry need.
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
|
@ -26,7 +26,7 @@
|
|||||||
<p>
|
<p>
|
||||||
Amazon Translate is a neural machine translation service for translating text to and from English across a breadth of supported languages.
|
Amazon Translate is a neural machine translation service for translating text to and from English across a breadth of supported languages.
|
||||||
Powered by deep-learning technologies, Amazon Translate delivers fast, high-quality, and affordable language translation.
|
Powered by deep-learning technologies, Amazon Translate delivers fast, high-quality, and affordable language translation.
|
||||||
It provides a managed, continually trained solution so you can easily translate company and user-authored content or build applications that require support across multiple languages.
|
It provides a managed, continually trained solution, so you can easily translate company and user-authored content or build applications that require support across multiple languages.
|
||||||
The machine translation engine has been trained on a wide variety of content across different domains to produce quality translations that serve any industry need.
|
The machine translation engine has been trained on a wide variety of content across different domains to produce quality translations that serve any industry need.
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
|
@ -47,7 +47,7 @@
|
|||||||
<p>
|
<p>
|
||||||
To solve this, the ListS3 Processor can optionally be configured with a Record Writer. When a Record Writer is configured, a single
|
To solve this, the ListS3 Processor can optionally be configured with a Record Writer. When a Record Writer is configured, a single
|
||||||
FlowFile will be created that will contain a Record for each object in the bucket, instead of a separate FlowFile per object.
|
FlowFile will be created that will contain a Record for each object in the bucket, instead of a separate FlowFile per object.
|
||||||
See the documentation for ListFile for an example of how to build a dataflow that allows for processing all of the objects before proceeding
|
See the documentation for ListFile for an example of how to build a dataflow that allows for processing all the objects before proceeding
|
||||||
with any other step.
|
with any other step.
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
|
@ -56,7 +56,7 @@
|
|||||||
This Elasticsearch client relies on a <code>RestClient</code> using the Apache HTTP Async Client. By default, it will start one
|
This Elasticsearch client relies on a <code>RestClient</code> using the Apache HTTP Async Client. By default, it will start one
|
||||||
dispatcher thread, and a number of worker threads used by the connection manager. There will be as many worker thread as the number
|
dispatcher thread, and a number of worker threads used by the connection manager. There will be as many worker thread as the number
|
||||||
of locally detected processors/cores on the NiFi host. Consequently, it is highly recommended to have only one instance of this
|
of locally detected processors/cores on the NiFi host. Consequently, it is highly recommended to have only one instance of this
|
||||||
controller service per remote Elasticsearch destination and have this controller service shared across all of the Elasticsearch
|
controller service per remote Elasticsearch destination and have this controller service shared across all the Elasticsearch
|
||||||
processors of the NiFi flows. Having a very high number of instances could lead to resource starvation and result in OOM errors.
|
processors of the NiFi flows. Having a very high number of instances could lead to resource starvation and result in OOM errors.
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
|
@ -35,7 +35,7 @@
|
|||||||
<em>name</em>.
|
<em>name</em>.
|
||||||
</p>
|
</p>
|
||||||
<p>
|
<p>
|
||||||
The query that is assembled from these is a boolean query where all of the criteria are under the <em>must</em> list.
|
The query that is assembled from these is a boolean query where all the criteria are under the <em>must</em> list.
|
||||||
In addition, wildcards are not supported right now and all criteria are translated into literal <em>match</em> queries.
|
In addition, wildcards are not supported right now and all criteria are translated into literal <em>match</em> queries.
|
||||||
</p>
|
</p>
|
||||||
<h2>Post-Processing</h2>
|
<h2>Post-Processing</h2>
|
||||||
|
@ -31,7 +31,7 @@
|
|||||||
will only be split at the top level because nested aggregations lose their context (and thus lose their value) if
|
will only be split at the top level because nested aggregations lose their context (and thus lose their value) if
|
||||||
separated from their parent aggregation. Additionally, the results from all pages can be combined into a single
|
separated from their parent aggregation. Additionally, the results from all pages can be combined into a single
|
||||||
flowfile (but the processor will only load each page of data into memory at any one time).</p>
|
flowfile (but the processor will only load each page of data into memory at any one time).</p>
|
||||||
<p>The following is an example query that would be created for tracking an "@timestamp" field:</p>
|
<p>The following is an example query that would be created for tracking a "@timestamp" field:</p>
|
||||||
<pre>
|
<pre>
|
||||||
{
|
{
|
||||||
"query": {
|
"query": {
|
||||||
|
@ -31,7 +31,7 @@
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
</pre>
|
</pre>
|
||||||
<p>To delete all of the contents of an index, this could be used:</p>
|
<p>To delete all the contents of an index, this could be used:</p>
|
||||||
<pre>
|
<pre>
|
||||||
{
|
{
|
||||||
"query": {
|
"query": {
|
||||||
|
@ -44,7 +44,7 @@
|
|||||||
record path operations that find an index or type value in the record set.
|
record path operations that find an index or type value in the record set.
|
||||||
The ID and operation type (create, index, update, upsert or delete) can also be extracted in a similar fashion from
|
The ID and operation type (create, index, update, upsert or delete) can also be extracted in a similar fashion from
|
||||||
the record set.
|
the record set.
|
||||||
An "@timestamp" field can be added to the data either using a default or by extracting it from the record set.
|
A "@timestamp" field can be added to the data either using a default or by extracting it from the record set.
|
||||||
This is useful if the documents are being indexed into an Elasticsearch Data Stream.
|
This is useful if the documents are being indexed into an Elasticsearch Data Stream.
|
||||||
</p>
|
</p>
|
||||||
<h4>Example - per-record actions</h4>
|
<h4>Example - per-record actions</h4>
|
||||||
|
@ -36,7 +36,7 @@
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
</pre>
|
</pre>
|
||||||
<p>To update all of the contents of an index, this could be used:</p>
|
<p>To update all the contents of an index, this could be used:</p>
|
||||||
<pre>
|
<pre>
|
||||||
{
|
{
|
||||||
"query": {
|
"query": {
|
||||||
|
@ -49,7 +49,7 @@
|
|||||||
</ul>
|
</ul>
|
||||||
</p>
|
</p>
|
||||||
<p>
|
<p>
|
||||||
Another useful property is <b>mail.debug</b> which allows Java Mail API to print protocol messages to the console helping you to both understand what's going on as well as debug issues.
|
Another useful property is <b>mail.debug</b> which allows Java Mail API to print protocol messages to the console helping you to both understand what's going on and debug issues.
|
||||||
</p>
|
</p>
|
||||||
<p>
|
<p>
|
||||||
For the full list of available Java Mail properties please refer to <a href="https://javaee.github.io/javamail/docs/api/com/sun/mail/imap/package-summary.html">here</a>
|
For the full list of available Java Mail properties please refer to <a href="https://javaee.github.io/javamail/docs/api/com/sun/mail/imap/package-summary.html">here</a>
|
||||||
|
@ -48,7 +48,7 @@
|
|||||||
</ul>
|
</ul>
|
||||||
</p>
|
</p>
|
||||||
<p>
|
<p>
|
||||||
Another useful property is <b>mail.debug</b> which allows Java Mail API to print protocol messages to the console helping you to both understand what's going on as well as debug issues.
|
Another useful property is <b>mail.debug</b> which allows Java Mail API to print protocol messages to the console helping you to both understand what's going on and debug issues.
|
||||||
</p>
|
</p>
|
||||||
<p>
|
<p>
|
||||||
For the full list of available Java Mail properties please refer to <a href="https://javaee.github.io/javamail/docs/api/com/sun/mail/pop3/package-summary.html">here</a>
|
For the full list of available Java Mail properties please refer to <a href="https://javaee.github.io/javamail/docs/api/com/sun/mail/pop3/package-summary.html">here</a>
|
||||||
|
@ -168,7 +168,7 @@ public abstract class AbstractBigQueryProcessor extends AbstractGCPProcessor<Big
|
|||||||
}
|
}
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* If sub-classes needs to implement any custom validation, override this method then add
|
* If subclasses needs to implement any custom validation, override this method then add
|
||||||
* validation result to the results.
|
* validation result to the results.
|
||||||
*/
|
*/
|
||||||
protected void customValidate(ValidationContext validationContext, Collection<ValidationResult> results) {
|
protected void customValidate(ValidationContext validationContext, Collection<ValidationResult> results) {
|
||||||
|
@ -47,7 +47,7 @@
|
|||||||
<p>
|
<p>
|
||||||
To solve this, the ListGCSBucket Processor can optionally be configured with a Record Writer. When a Record Writer is configured, a single
|
To solve this, the ListGCSBucket Processor can optionally be configured with a Record Writer. When a Record Writer is configured, a single
|
||||||
FlowFile will be created that will contain a Record for each object in the bucket, instead of a separate FlowFile per object.
|
FlowFile will be created that will contain a Record for each object in the bucket, instead of a separate FlowFile per object.
|
||||||
See the documentation for ListFile for an example of how to build a dataflow that allows for processing all of the objects before proceeding
|
See the documentation for ListFile for an example of how to build a dataflow that allows for processing all the objects before proceeding
|
||||||
with any other step.
|
with any other step.
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
|
@ -27,7 +27,7 @@
|
|||||||
<h3>Usage</h3>
|
<h3>Usage</h3>
|
||||||
<p>
|
<p>
|
||||||
GetGcpVisionAnnotateFilesOperationStatus is designed to periodically check the statuses of file annotation operations. This processor should be used in pair with StartGcpVisionAnnotateFilesOperation Processor.
|
GetGcpVisionAnnotateFilesOperationStatus is designed to periodically check the statuses of file annotation operations. This processor should be used in pair with StartGcpVisionAnnotateFilesOperation Processor.
|
||||||
An outgoing FlowFile contains the raw response returned by the Vision server. This response is in JSON json format and contains a google storage reference where the result is located, as well as additional metadata, as written in the <a href="https://cloud.google.com/vision/docs/reference/rest/v1/locations.operations#Operation" target="_blank">Google Vision API Reference document</a>.
|
An outgoing FlowFile contains the raw response returned by the Vision server. This response is in JSON format and contains a Google storage reference where the result is located, as well as additional metadata, as written in the <a href="https://cloud.google.com/vision/docs/reference/rest/v1/locations.operations#Operation" target="_blank">Google Vision API Reference document</a>.
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
</body>
|
</body>
|
||||||
|
@ -27,7 +27,7 @@
|
|||||||
<h3>Usage</h3>
|
<h3>Usage</h3>
|
||||||
<p>
|
<p>
|
||||||
GetGcpVisionAnnotateImagesOperationStatus is designed to periodically check the statuses of image annotation operations. This processor should be used in pair with StartGcpVisionAnnotateImagesOperation Processor.
|
GetGcpVisionAnnotateImagesOperationStatus is designed to periodically check the statuses of image annotation operations. This processor should be used in pair with StartGcpVisionAnnotateImagesOperation Processor.
|
||||||
An outgoing FlowFile contains the raw response returned by the Vision server. This response is in JSON json format and contains a google storage reference where the result is located, as well as additional metadata, as written in the <a href="https://cloud.google.com/vision/docs/reference/rest/v1/locations.operations#Operation" target="_blank">Google Vision API Reference document</a>.
|
An outgoing FlowFile contains the raw response returned by the Vision server. This response is in JSON format and contains a Google storage reference where the result is located, as well as additional metadata, as written in the <a href="https://cloud.google.com/vision/docs/reference/rest/v1/locations.operations#Operation" target="_blank">Google Vision API Reference document</a>.
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
</body>
|
</body>
|
||||||
|
@ -75,7 +75,7 @@
|
|||||||
</ul>
|
</ul>
|
||||||
You can find more details at <a href="https://cloud.google.com/vision/docs/features-list" target="_blank">Google Vision Feature List</a>
|
You can find more details at <a href="https://cloud.google.com/vision/docs/features-list" target="_blank">Google Vision Feature List</a>
|
||||||
|
|
||||||
<h3>Example: How to setup a simple Annotate Image Flow</h3>
|
<h3>Example: How to set up a simple Annotate Image Flow</h3>
|
||||||
<p>
|
<p>
|
||||||
Prerequisites
|
Prerequisites
|
||||||
</p>
|
</p>
|
||||||
|
@ -76,7 +76,7 @@
|
|||||||
<li>etc.</li>
|
<li>etc.</li>
|
||||||
</ul>
|
</ul>
|
||||||
You can find more details at <a href="https://cloud.google.com/vision/docs/features-list" target="_blank">Google Vision Feature List</a>
|
You can find more details at <a href="https://cloud.google.com/vision/docs/features-list" target="_blank">Google Vision Feature List</a>
|
||||||
<h3>Example: How to setup a simple Annotate Image Flow</h3>
|
<h3>Example: How to set up a simple Annotate Image Flow</h3>
|
||||||
<p>
|
<p>
|
||||||
Prerequisites
|
Prerequisites
|
||||||
</p>
|
</p>
|
||||||
|
@ -35,7 +35,7 @@
|
|||||||
<h2>Script Submission</h2>
|
<h2>Script Submission</h2>
|
||||||
<p>
|
<p>
|
||||||
Script submission is the default way to interact with the gremlin server. This takes the input script and uses <a href="https://tinkerpop.apache.org/docs/current/reference/#gremlin-go-scripts">Script Submission</a>
|
Script submission is the default way to interact with the gremlin server. This takes the input script and uses <a href="https://tinkerpop.apache.org/docs/current/reference/#gremlin-go-scripts">Script Submission</a>
|
||||||
to interact with the gremlin server. Because the script is shipped to the gremlin server as a string, only simple queries are recommended (count, path, etc)
|
to interact with the gremlin server. Because the script is shipped to the gremlin server as a string, only simple queries are recommended (count, path, etc.)
|
||||||
as there are no complex serializers available in this operation. This also means that NiFi will not be opinionated about what is returned, whatever the response from
|
as there are no complex serializers available in this operation. This also means that NiFi will not be opinionated about what is returned, whatever the response from
|
||||||
the tinkerpop server is, the response will be deserialized assuming common Java types. In the case of a Map return, the values
|
the tinkerpop server is, the response will be deserialized assuming common Java types. In the case of a Map return, the values
|
||||||
will be returned as a record in the FlowFile response, in all other cases, the return of the query will be coerced into a
|
will be returned as a record in the FlowFile response, in all other cases, the return of the query will be coerced into a
|
||||||
|
@ -23,7 +23,7 @@
|
|||||||
|
|
||||||
<body>
|
<body>
|
||||||
<h2>Summary</h2>
|
<h2>Summary</h2>
|
||||||
<p>This is a grooviest groovy script :)</p>
|
<p>This is the grooviest groovy script :)</p>
|
||||||
<h2>Script Bindings:</h2>
|
<h2>Script Bindings:</h2>
|
||||||
<table>
|
<table>
|
||||||
<tr><th>variable</th><th>type</th><th>description</th></tr>
|
<tr><th>variable</th><th>type</th><th>description</th></tr>
|
||||||
|
@ -132,7 +132,7 @@ ListHDFS configuration:
|
|||||||
<p>
|
<p>
|
||||||
To solve this, the ListHDFS Processor can optionally be configured with a Record Writer. When a Record Writer is configured, a single
|
To solve this, the ListHDFS Processor can optionally be configured with a Record Writer. When a Record Writer is configured, a single
|
||||||
FlowFile will be created that will contain a Record for each file in the directory, instead of a separate FlowFile per file.
|
FlowFile will be created that will contain a Record for each file in the directory, instead of a separate FlowFile per file.
|
||||||
See the documentation for ListFile for an example of how to build a dataflow that allows for processing all of the files before proceeding
|
See the documentation for ListFile for an example of how to build a dataflow that allows for processing all the files before proceeding
|
||||||
with any other step.
|
with any other step.
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
|
@ -26,7 +26,7 @@
|
|||||||
|
|
||||||
<p>
|
<p>
|
||||||
The ./conf/bootstrap-hashicorp-vault.conf file that comes with Apache NiFi is a convenient way to configure this
|
The ./conf/bootstrap-hashicorp-vault.conf file that comes with Apache NiFi is a convenient way to configure this
|
||||||
controller service in a manner consistent with the HashiCorpVault sensitive properties provider. Since this file is already used for configuring
|
controller service in a manner consistent with the HashiCorpVault sensitive property provider. Since this file is already used for configuring
|
||||||
the Vault client for protecting sensitive properties in the NiFi configuration files
|
the Vault client for protecting sensitive properties in the NiFi configuration files
|
||||||
(see the <a href="../../../../../html/administration-guide.html#hashicorp-vault-providers">Administrator's Guide</a>),
|
(see the <a href="../../../../../html/administration-guide.html#hashicorp-vault-providers">Administrator's Guide</a>),
|
||||||
it's a natural starting point for configuring the controller service as well.
|
it's a natural starting point for configuring the controller service as well.
|
||||||
|
@ -32,7 +32,7 @@
|
|||||||
|
|
||||||
<p>
|
<p>
|
||||||
The implementation supports the atomic method family defined in AtomicDistributedMapCacheClient. This is achieved by maintaining
|
The implementation supports the atomic method family defined in AtomicDistributedMapCacheClient. This is achieved by maintaining
|
||||||
a revision number for every entry. The revision is a 8 byte long integer. It is increased when the entry is updated. The value is kept
|
a revision number for every entry. The revision is an 8 byte long integer. It is increased when the entry is updated. The value is kept
|
||||||
during modifications not part of the atomic method family but this is mainly for regular management of the entries. It is not
|
during modifications not part of the atomic method family but this is mainly for regular management of the entries. It is not
|
||||||
recommended to work with elements by mixing the two method families.
|
recommended to work with elements by mixing the two method families.
|
||||||
</p>
|
</p>
|
||||||
|
@ -25,7 +25,7 @@
|
|||||||
|
|
||||||
<p>
|
<p>
|
||||||
This service starts and manages an embedded Hazelcast instance. The cache manager has direct accesses to the
|
This service starts and manages an embedded Hazelcast instance. The cache manager has direct accesses to the
|
||||||
instance - and the data stored in it. However, the instance sill opens a port for potential clients to join and
|
instance - and the data stored in it. However, the instance opens a port for potential clients to join and
|
||||||
this cannot be prevented. Note that this might leave the instance open for rogue clients to join.
|
this cannot be prevented. Note that this might leave the instance open for rogue clients to join.
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
|
@ -25,7 +25,7 @@
|
|||||||
|
|
||||||
<p>
|
<p>
|
||||||
This service connects to an external Hazelcast cluster (or standalone instance) as client. Hazelcast 4.0.0 or newer version is required.
|
This service connects to an external Hazelcast cluster (or standalone instance) as client. Hazelcast 4.0.0 or newer version is required.
|
||||||
The connection to the server is kept alive using Hazelcast's built in reconnection capability. This might be fine-tuned
|
The connection to the server is kept alive using Hazelcast's built-in reconnection capability. This might be fine-tuned
|
||||||
by setting the following properties:
|
by setting the following properties:
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
|
@ -31,7 +31,7 @@
|
|||||||
</p>
|
</p>
|
||||||
<p>
|
<p>
|
||||||
All JMS vendors and <i>ConnectionFactory</i> implementations are supported as long as the configuration values can
|
All JMS vendors and <i>ConnectionFactory</i> implementations are supported as long as the configuration values can
|
||||||
be set through <i>set</i> methods (detailed explanation in the last paragraph). However some helpful accommodation
|
be set through <i>set</i> methods (detailed explanation in the last paragraph). However, some helpful accommodation
|
||||||
are done for the following JMS vendors:
|
are done for the following JMS vendors:
|
||||||
</p>
|
</p>
|
||||||
<ul>
|
<ul>
|
||||||
|
@ -64,7 +64,7 @@ There are multiple ways to configure the Connection Factory for the processor:
|
|||||||
</ul>
|
</ul>
|
||||||
<p>
|
<p>
|
||||||
The preferred way is to use the Connection Factory Service property and a pre-configured controller service. It is also the most convenient method, because it is enough
|
The preferred way is to use the Connection Factory Service property and a pre-configured controller service. It is also the most convenient method, because it is enough
|
||||||
to configure the controller service once and then it can be used in multiple processors.
|
to configure the controller service once, and then it can be used in multiple processors.
|
||||||
</p>
|
</p>
|
||||||
<p>
|
<p>
|
||||||
However, some JMS client libraries may not work with the controller services due to incompatible Java ClassLoader handling between the 3rd party JMS client library and NiFi.
|
However, some JMS client libraries may not work with the controller services due to incompatible Java ClassLoader handling between the 3rd party JMS client library and NiFi.
|
||||||
|
@ -56,7 +56,7 @@
|
|||||||
The update key method takes keys from the document and builds a query from them. It will attempt to parse the <em>_id</em>
|
The update key method takes keys from the document and builds a query from them. It will attempt to parse the <em>_id</em>
|
||||||
field as an <em>ObjectID</em> type if that is one of the keys that is specified in the configuration field. Multiple keys
|
field as an <em>ObjectID</em> type if that is one of the keys that is specified in the configuration field. Multiple keys
|
||||||
can be specified by separating them with commas. This configuration field supports Expression Language, so it can
|
can be specified by separating them with commas. This configuration field supports Expression Language, so it can
|
||||||
derived in part or entirely from flowfile attributes.
|
be derived in part or entirely from flowfile attributes.
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
<h3>Update Query</h3>
|
<h3>Update Query</h3>
|
||||||
@ -64,7 +64,7 @@
|
|||||||
The update query method takes a valid JSON document as its value and uses it to find one or more documents to update.
|
The update query method takes a valid JSON document as its value and uses it to find one or more documents to update.
|
||||||
This field supports Expression Language, so it can be derived in part or entirely from flowfile attributes. It is possible,
|
This field supports Expression Language, so it can be derived in part or entirely from flowfile attributes. It is possible,
|
||||||
for instance, to put an attribute named <em>update_query</em> on a flowfile and specify <em>${update_query}</em> in the
|
for instance, to put an attribute named <em>update_query</em> on a flowfile and specify <em>${update_query}</em> in the
|
||||||
configuration field so it will load the value from the flowfile.
|
configuration field, so it will load the value from the flowfile.
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
<h3>Upserts</h3>
|
<h3>Upserts</h3>
|
||||||
|
@ -22,7 +22,7 @@
|
|||||||
</head>
|
</head>
|
||||||
<body>
|
<body>
|
||||||
<p>The MQTT messages are always being sent to subscribers on a topic regardless of how frequently the processor is scheduled to run.
|
<p>The MQTT messages are always being sent to subscribers on a topic regardless of how frequently the processor is scheduled to run.
|
||||||
If the 'Run Schedule' is significantly behind the rate at which the messages are arriving to this processor, then a back up can occur
|
If the 'Run Schedule' is significantly behind the rate at which the messages are arriving to this processor, then a back-up can occur
|
||||||
in the internal queue of this processor. Each time the processor is scheduled, the messages in the internal queue will be written to
|
in the internal queue of this processor. Each time the processor is scheduled, the messages in the internal queue will be written to
|
||||||
FlowFiles. In case the internal queue is full, the MQTT client will try for up to 1 second to add the message into the internal queue.
|
FlowFiles. In case the internal queue is full, the MQTT client will try for up to 1 second to add the message into the internal queue.
|
||||||
If the internal queue is still full after this time, an exception saying that 'The subscriber queue is full' would be thrown, the
|
If the internal queue is still full after this time, an exception saying that 'The subscriber queue is full' would be thrown, the
|
||||||
|
@ -104,10 +104,10 @@ Jane, Ten
|
|||||||
When two values are encountered for the same field in two different records (or two values are encountered for an ARRAY type), the inference engine prefers
|
When two values are encountered for the same field in two different records (or two values are encountered for an ARRAY type), the inference engine prefers
|
||||||
to use a "wider" data type over using a CHOICE data type. A data type "A" is said to be wider than data type "B" if and only if data type "A" encompasses all
|
to use a "wider" data type over using a CHOICE data type. A data type "A" is said to be wider than data type "B" if and only if data type "A" encompasses all
|
||||||
values of "B" in addition to other values. For example, the LONG type is wider than the INT type but not wider than the BOOLEAN type (and BOOLEAN is also not wider
|
values of "B" in addition to other values. For example, the LONG type is wider than the INT type but not wider than the BOOLEAN type (and BOOLEAN is also not wider
|
||||||
than LONG). INT is wider than SHORT. The STRING type is considered wider than all other types with the Exception of MAP, RECORD, ARRAY, and CHOICE.
|
than LONG). INT is wider than SHORT. The STRING type is considered wider than all other types except MAP, RECORD, ARRAY, and CHOICE.
|
||||||
</li>
|
</li>
|
||||||
<li>
|
<li>
|
||||||
Before inferring the type of a value, leading and trailing whitespace are removed. Additionally, if the value is surrounded by double-quotes ("), the double-quotes
|
Before inferring the type of value, leading and trailing whitespace are removed. Additionally, if the value is surrounded by double-quotes ("), the double-quotes
|
||||||
are removed. Therefore, the value <code>16</code> is interpreted the same as <code> "16"</code>. Both will be interpreted as an INT. However, the value
|
are removed. Therefore, the value <code>16</code> is interpreted the same as <code> "16"</code>. Both will be interpreted as an INT. However, the value
|
||||||
<code>" 16"</code> will be inferred as a STRING type because the white space is enclosed within double-quotes, which means that the white space is considered
|
<code>" 16"</code> will be inferred as a STRING type because the white space is enclosed within double-quotes, which means that the white space is considered
|
||||||
part of the value.
|
part of the value.
|
||||||
|
@ -129,7 +129,7 @@
|
|||||||
<p>
|
<p>
|
||||||
Protocol Buffers offers further Message types called Well-Known Types. These are additionally provided messages that defines
|
Protocol Buffers offers further Message types called Well-Known Types. These are additionally provided messages that defines
|
||||||
complex structured types and wrappers for scalar types. The Any type is one of these Well-Known Types which is used to store an arbitrary
|
complex structured types and wrappers for scalar types. The Any type is one of these Well-Known Types which is used to store an arbitrary
|
||||||
serialized Message along with an URL that describes the type of the serialized Message. Since the Message type and the embedded Message will be
|
serialized Message along with a URL that describes the type of the serialized Message. Since the Message type and the embedded Message will be
|
||||||
available only when the Any Message is already populated with data, the ProtobufReader needs to do this Message processing at data conversion time.
|
available only when the Any Message is already populated with data, the ProtobufReader needs to do this Message processing at data conversion time.
|
||||||
The Reader is capable to generate schema for the embedded Message in the Any field and replace it in the result Record schema.
|
The Reader is capable to generate schema for the embedded Message in the Any field and replace it in the result Record schema.
|
||||||
</p>
|
</p>
|
||||||
|
@ -128,7 +128,7 @@ if(!flowFileList.isEmpty()) {
|
|||||||
<p> </p>
|
<p> </p>
|
||||||
<p><strong>Create a new FlowFile from a parent FlowFile</strong></p>
|
<p><strong>Create a new FlowFile from a parent FlowFile</strong></p>
|
||||||
<p><strong>Use Case</strong>: You want to generate new FlowFile(s) based on an incoming FlowFile.</p>
|
<p><strong>Use Case</strong>: You want to generate new FlowFile(s) based on an incoming FlowFile.</p>
|
||||||
<p><strong>Approach</strong>: Use the create(<em>parentFlowFile</em>) method from the session object. This method takes a parent FlowFile reference and returns a new child FlowFile object. The newly created FlowFile will inherit all of the parent's attributes except for the UUID. This method will automatically generate a Provenance FORK event or a Provenance JOIN event, depending on whether or not other FlowFiles are generated from the same parent before the ProcessSession is committed.</p>
|
<p><strong>Approach</strong>: Use the create(<em>parentFlowFile</em>) method from the session object. This method takes a parent FlowFile reference and returns a new child FlowFile object. The newly created FlowFile will inherit all the parent's attributes except for the UUID. This method will automatically generate a Provenance FORK event or a Provenance JOIN event, depending on whether other FlowFiles are generated from the same parent before the ProcessSession is committed.</p>
|
||||||
<p><strong>Examples</strong>:</p>
|
<p><strong>Examples</strong>:</p>
|
||||||
<p><em>Groovy</em></p>
|
<p><em>Groovy</em></p>
|
||||||
<pre>flowFile = session.get()
|
<pre>flowFile = session.get()
|
||||||
@ -213,7 +213,7 @@ else {
|
|||||||
<p> </p>
|
<p> </p>
|
||||||
<p><strong>Read the contents of an incoming FlowFile using a callback</strong></p>
|
<p><strong>Read the contents of an incoming FlowFile using a callback</strong></p>
|
||||||
<p><strong>Use Case</strong>: You have incoming connection(s) to ExecuteScript and want to retrieve the contents of a FlowFile from the queue(s) for processing.</p>
|
<p><strong>Use Case</strong>: You have incoming connection(s) to ExecuteScript and want to retrieve the contents of a FlowFile from the queue(s) for processing.</p>
|
||||||
<p><strong>Approach</strong>: Use the read(<em>flowFile</em>, <em>inputStreamCallback</em>) method from the session object. An InputStreamCallback object is needed to pass into the read() method. Note that because InputStreamCallback is an object, the contents are only visible to that object by default. If you need to use the data outside the read() method, use a more globally-scoped variable. The examples will store the full contents of the incoming FlowFile into a String (using Apache Commons' IOUtils class). NOTE: For large FlowFiles, this is not the best technique; rather you should read in only as much data as you need, and process that as appropriate. For something like SplitText, you could read in a line at a time and process it within the InputStreamCallback, or use the session.read(flowFile) approach mentioned earlier to get an InputStream reference to use outside of a callback.</p>
|
<p><strong>Approach</strong>: Use the read(<em>flowFile</em>, <em>inputStreamCallback</em>) method from the session object. An InputStreamCallback object is needed to pass into the read() method. Note that because InputStreamCallback is an object, the contents are only visible to that object by default. If you need to use the data outside the read() method, use a more globally-scoped variable. The examples will store the full contents of the incoming FlowFile into a String (using Apache Commons' IOUtils class). NOTE: For large FlowFiles, this is not the best technique; rather you should read in only as much data as you need, and process that as appropriate. For something like SplitText, you could read in a line at a time and process it within the InputStreamCallback, or use the session.read(flowFile) approach mentioned earlier to get an InputStream reference to use outside the callback.</p>
|
||||||
<p><strong>Examples</strong>:</p>
|
<p><strong>Examples</strong>:</p>
|
||||||
<p><em>Groovy</em></p>
|
<p><em>Groovy</em></p>
|
||||||
<pre>import org.apache.commons.io.IOUtils
|
<pre>import org.apache.commons.io.IOUtils
|
||||||
|
@ -39,7 +39,7 @@ td {text-align: left}
|
|||||||
</p>
|
</p>
|
||||||
|
|
||||||
<p>
|
<p>
|
||||||
The Processor expects an user defined script in order to determine which Records should be kept and filtered out.
|
The Processor expects a user defined script in order to determine which Records should be kept and filtered out.
|
||||||
When creating a script, it is important to note that, unlike ExecuteScript, this Processor does not allow the script itself to expose Properties to be configured or define Relationships.
|
When creating a script, it is important to note that, unlike ExecuteScript, this Processor does not allow the script itself to expose Properties to be configured or define Relationships.
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
|
@ -147,7 +147,7 @@ td {text-align: left}
|
|||||||
|
|
||||||
<p>
|
<p>
|
||||||
A simple 5-minute benchmark was done to analyze the difference in performance. The script used simply modifies one field and return the Record otherwise unmodified.
|
A simple 5-minute benchmark was done to analyze the difference in performance. The script used simply modifies one field and return the Record otherwise unmodified.
|
||||||
The results are shown below. Note that no specifics are given with regards to hardware, specifically because the results should not be used to garner expectations of
|
The results are shown below. Note that no specifics are given regarding hardware, specifically because the results should not be used to garner expectations of
|
||||||
absolute performance but rather to show relative performance between the different options.
|
absolute performance but rather to show relative performance between the different options.
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
|
@ -40,7 +40,7 @@ td {text-align: left}
|
|||||||
</p>
|
</p>
|
||||||
|
|
||||||
<p>
|
<p>
|
||||||
The Processor expects an user defined script in order to determine the validity of the Records.
|
The Processor expects a user defined script in order to determine the validity of the Records.
|
||||||
When creating a script, it is important to note that, unlike ExecuteScript, this Processor does not allow the script itself to expose Properties to be configured or define Relationships.
|
When creating a script, it is important to note that, unlike ExecuteScript, this Processor does not allow the script itself to expose Properties to be configured or define Relationships.
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
|
@ -25,14 +25,14 @@
|
|||||||
<p>
|
<p>
|
||||||
The Site-to-Site Metrics Reporting Task allows the user to publish NiFi's metrics (as in the Ambari reporting task) to the
|
The Site-to-Site Metrics Reporting Task allows the user to publish NiFi's metrics (as in the Ambari reporting task) to the
|
||||||
same NiFi instance or another NiFi instance. This provides a great deal of power because it allows the user to make use of
|
same NiFi instance or another NiFi instance. This provides a great deal of power because it allows the user to make use of
|
||||||
all of the different Processors that are available in NiFi in order to process or distribute that data.
|
all the different Processors that are available in NiFi in order to process or distribute that data.
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
<h2>Ambari format</h2>
|
<h2>Ambari format</h2>
|
||||||
|
|
||||||
<p>
|
<p>
|
||||||
There are two available output formats. The first one is the Ambari format as defined in the Ambari Metrics Collector
|
There are two available output formats. The first one is the Ambari format as defined in the Ambari Metrics Collector
|
||||||
API which is a JSON with dynamic keys. If using this format you might be interested by the below Jolt specification to
|
API which is a JSON with dynamic keys. If using this format you might be interested in the below Jolt specification to
|
||||||
transform the data.
|
transform the data.
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
|
@ -23,9 +23,9 @@
|
|||||||
|
|
||||||
<body>
|
<body>
|
||||||
<p>
|
<p>
|
||||||
The Site-to-Site Provenance Reporting Task allows the user to publish all of the Provenance Events from a NiFi instance back to
|
The Site-to-Site Provenance Reporting Task allows the user to publish all the Provenance Events from a NiFi instance back to
|
||||||
the same NiFi instance or another NiFi instance. This provides a great deal of power because it allows the user to make use of
|
the same NiFi instance or another NiFi instance. This provides a great deal of power because it allows the user to make use of
|
||||||
all of the different Processors that are available in NiFi in order to process or distribute that data. When possible, it is
|
all the different Processors that are available in NiFi in order to process or distribute that data. When possible, it is
|
||||||
advisable to send the Provenance data to a different NiFi instance than the one that this Reporting Task is running on, because
|
advisable to send the Provenance data to a different NiFi instance than the one that this Reporting Task is running on, because
|
||||||
when the data is received over Site-to-Site and processed, that in and of itself will generate Provenance events. As a result, there
|
when the data is received over Site-to-Site and processed, that in and of itself will generate Provenance events. As a result, there
|
||||||
is a cycle that is created. However, the data is sent in batches (1,000 by default). This means that for each batch of Provenance events
|
is a cycle that is created. However, the data is sent in batches (1,000 by default). This means that for each batch of Provenance events
|
||||||
|
@ -23,9 +23,9 @@
|
|||||||
<body>
|
<body>
|
||||||
<h2>Summary</h2>
|
<h2>Summary</h2>
|
||||||
<p>
|
<p>
|
||||||
This processor polls a SNMP agent to get information for a given OID or OIDs (Strategy = GET) or for all the sub-tree
|
This processor polls an SNMP agent to get information for a given OID or OIDs (Strategy = GET) or for all the subtree
|
||||||
associated to a given OID or OIDs
|
associated to a given OID or OIDs
|
||||||
(Strategy = WALK). This processors supports SNMPv1, SNMPv2c and SNMPv3. The component is based on <a
|
(Strategy = WALK). This processor supports SNMPv1, SNMPv2c and SNMPv3. The component is based on <a
|
||||||
href="http://www.snmp4j.org/">SNMP4J</a>.
|
href="http://www.snmp4j.org/">SNMP4J</a>.
|
||||||
</p>
|
</p>
|
||||||
<p>
|
<p>
|
||||||
|
@ -23,7 +23,7 @@
|
|||||||
<body>
|
<body>
|
||||||
<h2>Summary</h2>
|
<h2>Summary</h2>
|
||||||
<p>
|
<p>
|
||||||
This processor sends SNMP set requests to a SNMP agent in order to update information associated to a given OID.
|
This processor sends SNMP set requests to an SNMP agent in order to update information associated to a given OID.
|
||||||
This processor supports
|
This processor supports
|
||||||
SNMPv1, SNMPv2c and SNMPv3. The component is based on <a href="http://www.snmp4j.org/">SNMP4J</a>.
|
SNMPv1, SNMPv2c and SNMPv3. The component is based on <a href="http://www.snmp4j.org/">SNMP4J</a>.
|
||||||
</p>
|
</p>
|
||||||
@ -32,7 +32,7 @@
|
|||||||
looking for attributes
|
looking for attributes
|
||||||
prefixed with <i>snmp$</i>. If such an attribute is found, the attribute name is split using the $ character. The
|
prefixed with <i>snmp$</i>. If such an attribute is found, the attribute name is split using the $ character. The
|
||||||
second element must
|
second element must
|
||||||
respect the OID format to be considered as a valid OID. If there is a third element, it must represents the SMI
|
respect the OID format to be considered as a valid OID. If there is a third element, it must represent the SMI
|
||||||
Syntax integer value of
|
Syntax integer value of
|
||||||
the type of data associated to the given OID to allow a correct conversion. If there is no third element, the value
|
the type of data associated to the given OID to allow a correct conversion. If there is no third element, the value
|
||||||
is considered as a String
|
is considered as a String
|
||||||
|
@ -24,7 +24,7 @@
|
|||||||
<h2>PutSplunkHTTP</h2>
|
<h2>PutSplunkHTTP</h2>
|
||||||
|
|
||||||
<p>
|
<p>
|
||||||
This processor serves as a counterpart for PutSplunk processor. While the later solves communication using TCP and
|
This processor serves as a counterpart for PutSplunk processor. While the latter solves communication using TCP and
|
||||||
UDP protocols, PutSplunkHTTP aims to send events into Splunk via HTTP or HTTPS. In this fashion, this processor
|
UDP protocols, PutSplunkHTTP aims to send events into Splunk via HTTP or HTTPS. In this fashion, this processor
|
||||||
shows similarities with GetSplunk processor and the properties relevant to the connection with Splunk server are
|
shows similarities with GetSplunk processor and the properties relevant to the connection with Splunk server are
|
||||||
identical. There are however some aspects unique for this processor:
|
identical. There are however some aspects unique for this processor:
|
||||||
|
@ -55,7 +55,7 @@
|
|||||||
|
|
||||||
<p>
|
<p>
|
||||||
Undetermined cases are normal in healthy environment as it is possible that NiFi asks for indexing status before Splunk
|
Undetermined cases are normal in healthy environment as it is possible that NiFi asks for indexing status before Splunk
|
||||||
finishes and acknowledges it. These cases are safe to retry and it is suggested to loop "undetermined" relationship
|
finishes and acknowledges it. These cases are safe to retry, and it is suggested to loop "undetermined" relationship
|
||||||
back to the processor for later try. Flow files transferred into the "Undetermined" relationship are penalized.
|
back to the processor for later try. Flow files transferred into the "Undetermined" relationship are penalized.
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
@ -70,7 +70,7 @@
|
|||||||
<p>
|
<p>
|
||||||
Also, it is suggested to execute the query in batches. The "Maximum Query Size" property might be used for fine tune
|
Also, it is suggested to execute the query in batches. The "Maximum Query Size" property might be used for fine tune
|
||||||
the maximum number of events the processor will query about in one API request. This serves as an upper limit for the
|
the maximum number of events the processor will query about in one API request. This serves as an upper limit for the
|
||||||
batch but the processor might execute the query with less number of undetermined events.
|
batch but the processor might execute the query with fewer events.
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
</body>
|
</body>
|
||||||
|
@ -33,7 +33,7 @@
|
|||||||
<p>
|
<p>
|
||||||
The results are transmitted to the destination using the configured Record Sink service, such as
|
The results are transmitted to the destination using the configured Record Sink service, such as
|
||||||
SiteToSiteReportingRecordSink (for sending via the Site-to-Site protocol) or DatabaseRecordSink (for sending the
|
SiteToSiteReportingRecordSink (for sending via the Site-to-Site protocol) or DatabaseRecordSink (for sending the
|
||||||
query result rows to an relational database).
|
query result rows to a relational database).
|
||||||
</p>
|
</p>
|
||||||
<p>
|
<p>
|
||||||
The reporting task can uniquely handle items from the bulletin and provenance repositories. This means that an item will only be processed once when the query is set to unique.
|
The reporting task can uniquely handle items from the bulletin and provenance repositories. This means that an item will only be processed once when the query is set to unique.
|
||||||
|
@ -55,7 +55,7 @@
|
|||||||
</table>
|
</table>
|
||||||
</p>
|
</p>
|
||||||
<p>If the Grouping Attribute property is specified, all rates are accumulated separately for unique values of the specified attribute. For example, assume Grouping Attribute property is
|
<p>If the Grouping Attribute property is specified, all rates are accumulated separately for unique values of the specified attribute. For example, assume Grouping Attribute property is
|
||||||
specified and the its value is "city". All FlowFiles containing a "city" attribute with value "Albuquerque" will have an accumulated rate calculated. A separate rate will be calculated
|
specified and its value is "city". All FlowFiles containing a "city" attribute with value "Albuquerque" will have an accumulated rate calculated. A separate rate will be calculated
|
||||||
for all FlowFiles containing a "city" attribute with a value "Boston". In other words, separate rate calculations will be accumulated for all unique values of the Grouping Attribute.
|
for all FlowFiles containing a "city" attribute with a value "Boston". In other words, separate rate calculations will be accumulated for all unique values of the Grouping Attribute.
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
|
@ -23,8 +23,8 @@
|
|||||||
<body>
|
<body>
|
||||||
<!-- Processor Documentation ================================================== -->
|
<!-- Processor Documentation ================================================== -->
|
||||||
<p>
|
<p>
|
||||||
When triggered, the processor loops through the appropriate response list (based on whether or not it
|
When triggered, the processor loops through the appropriate response list.
|
||||||
received a FlowFile). A response is produced the configured number of times for each pass through its
|
A response is produced the configured number of times for each pass through its
|
||||||
response list, as long as the processor is running.
|
response list, as long as the processor is running.
|
||||||
</p><p>
|
</p><p>
|
||||||
Triggered by a FlowFile, the processor can produce the following responses.
|
Triggered by a FlowFile, the processor can produce the following responses.
|
||||||
|
@ -155,7 +155,7 @@ This strategy would produce output the looks like this (assuming a JSON Writer):
|
|||||||
<h3>Insert Enrichment Fields</h3>
|
<h3>Insert Enrichment Fields</h3>
|
||||||
|
|
||||||
<p>
|
<p>
|
||||||
The "Insert Enrichment Fields" strategy inserts all of the fields of the "enrichment" record into the original record. The records are correlated by their index in the FlowFile. That is,
|
The "Insert Enrichment Fields" strategy inserts all the fields of the "enrichment" record into the original record. The records are correlated by their index in the FlowFile. That is,
|
||||||
the first record in the "enrichment" FlowFile is inserted into the first record in the "original" FlowFile. The second record of the "enrichment" FlowFile is inserted into the second
|
the first record in the "enrichment" FlowFile is inserted into the first record in the "original" FlowFile. The second record of the "enrichment" FlowFile is inserted into the second
|
||||||
record of the "original" FlowFile and so on.
|
record of the "original" FlowFile and so on.
|
||||||
</p>
|
</p>
|
||||||
@ -323,7 +323,7 @@ FlowFile as its own table with the name "original" while we treat the enrichment
|
|||||||
</p>
|
</p>
|
||||||
|
|
||||||
<p>
|
<p>
|
||||||
Given this, we might combine all of the data using a simple query such as:
|
Given this, we might combine all the data using a simple query such as:
|
||||||
</p>
|
</p>
|
||||||
<pre><code>
|
<pre><code>
|
||||||
SELECT o.*, e.*
|
SELECT o.*, e.*
|
||||||
@ -445,7 +445,7 @@ using this Processor.
|
|||||||
small attributes on a FlowFile is perfectly fine. Storing 300 attributes, on the other hand, may occupy a significant amount of heap.
|
small attributes on a FlowFile is perfectly fine. Storing 300 attributes, on the other hand, may occupy a significant amount of heap.
|
||||||
</li>
|
</li>
|
||||||
<li>
|
<li>
|
||||||
Limit backpressure. The JoinEnrichment Processor will pull into its own memory all of the incoming FlowFiles. As a result, it will be helpful to avoid providing a huge number of FlowFiles
|
Limit backpressure. The JoinEnrichment Processor will pull into its own memory all the incoming FlowFiles. As a result, it will be helpful to avoid providing a huge number of FlowFiles
|
||||||
to the Processor at any given time. This can be done by setting the backpressure limits to a smaller value. For example, in our example above, the ForkEnrichment Processor is connected
|
to the Processor at any given time. This can be done by setting the backpressure limits to a smaller value. For example, in our example above, the ForkEnrichment Processor is connected
|
||||||
directly to the JoinEnrichment Processor. We may want to limit the backpressure on this connection to 500 or 1,000 instead of the default 10,000. Doing so will limit the number of FlowFiles
|
directly to the JoinEnrichment Processor. We may want to limit the backpressure on this connection to 500 or 1,000 instead of the default 10,000. Doing so will limit the number of FlowFiles
|
||||||
that are allowed to be loaded into the JoinEnrichment Processor at one time.
|
that are allowed to be loaded into the JoinEnrichment Processor at one time.
|
||||||
@ -456,7 +456,7 @@ using this Processor.
|
|||||||
<h3>More Complex Joining Strategies</h3>
|
<h3>More Complex Joining Strategies</h3>
|
||||||
<p>
|
<p>
|
||||||
This Processor offers several strategies that can be used for correlating data together and joining records from two different FlowFiles into a single FlowFile. However, there are times
|
This Processor offers several strategies that can be used for correlating data together and joining records from two different FlowFiles into a single FlowFile. However, there are times
|
||||||
when users may require more powerful capabilities than what is offered. We might, for example, want to use the information in an enrichment record to determine whether or not to null out a value in
|
when users may require more powerful capabilities than what is offered. We might, for example, want to use the information in an enrichment record to determine whether to null out a value in
|
||||||
the corresponding original records.
|
the corresponding original records.
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
|
@ -52,7 +52,7 @@
|
|||||||
|
|
||||||
<p>
|
<p>
|
||||||
We can still accomplish the desired use case of waiting until all files in the directory have been processed by splitting apart the FlowFile
|
We can still accomplish the desired use case of waiting until all files in the directory have been processed by splitting apart the FlowFile
|
||||||
and processing all of the data within a Process Group. Configuring the Process Group with a FlowFile Concurrency of "Single FlowFile per Node"
|
and processing all the data within a Process Group. Configuring the Process Group with a FlowFile Concurrency of "Single FlowFile per Node"
|
||||||
means that only one FlowFile will be brought into the Process Group. Once that happens, the FlowFile can be split apart and each part processed.
|
means that only one FlowFile will be brought into the Process Group. Once that happens, the FlowFile can be split apart and each part processed.
|
||||||
Configuring the Process Group with an Outbound Policy of "Batch Output" means that none of the FlowFiles will leave the Process Group until all have
|
Configuring the Process Group with an Outbound Policy of "Batch Output" means that none of the FlowFiles will leave the Process Group until all have
|
||||||
finished processing. As a result, we can build a flow like the following:
|
finished processing. As a result, we can build a flow like the following:
|
||||||
|
@ -52,7 +52,7 @@
|
|||||||
|
|
||||||
<p>
|
<p>
|
||||||
We can still accomplish the desired use case of waiting until all files in the directory have been processed by splitting apart the FlowFile
|
We can still accomplish the desired use case of waiting until all files in the directory have been processed by splitting apart the FlowFile
|
||||||
and processing all of the data within a Process Group. Configuring the Process Group with a FlowFile Concurrency of "Single FlowFile per Node"
|
and processing all the data within a Process Group. Configuring the Process Group with a FlowFile Concurrency of "Single FlowFile per Node"
|
||||||
means that only one FlowFile will be brought into the Process Group. Once that happens, the FlowFile can be split apart and each part processed.
|
means that only one FlowFile will be brought into the Process Group. Once that happens, the FlowFile can be split apart and each part processed.
|
||||||
Configuring the Process Group with an Outbound Policy of "Batch Output" means that none of the FlowFiles will leave the Process Group until all have
|
Configuring the Process Group with an Outbound Policy of "Batch Output" means that none of the FlowFiles will leave the Process Group until all have
|
||||||
finished processing. As a result, we can build a flow like the following:
|
finished processing. As a result, we can build a flow like the following:
|
||||||
|
@ -52,7 +52,7 @@
|
|||||||
|
|
||||||
<p>
|
<p>
|
||||||
We can still accomplish the desired use case of waiting until all files in the directory have been processed by splitting apart the FlowFile
|
We can still accomplish the desired use case of waiting until all files in the directory have been processed by splitting apart the FlowFile
|
||||||
and processing all of the data within a Process Group. Configuring the Process Group with a FlowFile Concurrency of "Single FlowFile per Node"
|
and processing all the data within a Process Group. Configuring the Process Group with a FlowFile Concurrency of "Single FlowFile per Node"
|
||||||
means that only one FlowFile will be brought into the Process Group. Once that happens, the FlowFile can be split apart and each part processed.
|
means that only one FlowFile will be brought into the Process Group. Once that happens, the FlowFile can be split apart and each part processed.
|
||||||
Configuring the Process Group with an Outbound Policy of "Batch Output" means that none of the FlowFiles will leave the Process Group until all have
|
Configuring the Process Group with an Outbound Policy of "Batch Output" means that none of the FlowFiles will leave the Process Group until all have
|
||||||
finished processing. As a result, we can build a flow like the following:
|
finished processing. As a result, we can build a flow like the following:
|
||||||
|
@ -49,8 +49,8 @@
|
|||||||
<p>
|
<p>
|
||||||
How the Processor determines which bin to place a FlowFile in depends on a few different configuration options. Firstly, the Merge Strategy
|
How the Processor determines which bin to place a FlowFile in depends on a few different configuration options. Firstly, the Merge Strategy
|
||||||
is considered. The Merge Strategy can be set to one of two options: "Bin Packing Algorithm," or "Defragment". When the goal is to simply combine
|
is considered. The Merge Strategy can be set to one of two options: "Bin Packing Algorithm," or "Defragment". When the goal is to simply combine
|
||||||
smaller FlowFiles into one larger FlowFile, the Bin Packing Algorithm should be used. This algorithm picks a bin based on whether or not the FlowFile
|
smaller FlowFiles into one larger FlowFile, the Bin Packing Algorithm should be used. This algorithm picks a bin based on whether the FlowFile
|
||||||
can fit in the bin according to its size and the <Maximum Bin Size> property and whether or not the FlowFile is 'like' the other FlowFiles in
|
can fit in the bin according to its size and the <Maximum Bin Size> property and whether the FlowFile is 'like' the other FlowFiles in
|
||||||
the bin. What it means for two FlowFiles to be 'like FlowFiles' is discussed at the end of this section.
|
the bin. What it means for two FlowFiles to be 'like FlowFiles' is discussed at the end of this section.
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
@ -62,7 +62,7 @@
|
|||||||
so that the FlowFiles can be ordered correctly. For a given "fragment.identifier", at least one FlowFile must have the "fragment.count" attribute
|
so that the FlowFiles can be ordered correctly. For a given "fragment.identifier", at least one FlowFile must have the "fragment.count" attribute
|
||||||
(which indicates how many FlowFiles belong in the bin). Other FlowFiles with the same identifier must have the same value for the "fragment.count" attribute,
|
(which indicates how many FlowFiles belong in the bin). Other FlowFiles with the same identifier must have the same value for the "fragment.count" attribute,
|
||||||
or they can omit this attribute.
|
or they can omit this attribute.
|
||||||
<b>NOTE:</b> while there are valid use cases for breaking apart FlowFiles and later re-merging them, it is an anti-pattern to take a larger FlowFile,
|
<b>NOTE:</b> while there are valid use cases for breaking apart FlowFiles and later re-merging them, it is an antipattern to take a larger FlowFile,
|
||||||
break it into a million tiny FlowFiles, and then re-merge them. Doing so can result in using huge amounts of Java heap and can result in Out Of Memory Errors.
|
break it into a million tiny FlowFiles, and then re-merge them. Doing so can result in using huge amounts of Java heap and can result in Out Of Memory Errors.
|
||||||
Additionally, it adds large amounts of load to the NiFi framework. This can result in increased CPU and disk utilization and often times can be an order of magnitude
|
Additionally, it adds large amounts of load to the NiFi framework. This can result in increased CPU and disk utilization and often times can be an order of magnitude
|
||||||
lower throughput and an order of magnitude higher latency. As an alternative, whenever possible, dataflows should be built to make use of Record-oriented processors,
|
lower throughput and an order of magnitude higher latency. As an alternative, whenever possible, dataflows should be built to make use of Record-oriented processors,
|
||||||
@ -84,8 +84,7 @@
|
|||||||
<h3>When a Bin is Merged</h3>
|
<h3>When a Bin is Merged</h3>
|
||||||
<p>
|
<p>
|
||||||
Above, we discussed how a bin is chosen for a given FlowFile. Once a bin has been created and FlowFiles added to it, we must have some way to determine
|
Above, we discussed how a bin is chosen for a given FlowFile. Once a bin has been created and FlowFiles added to it, we must have some way to determine
|
||||||
when a bin is "full" so that we can bin those FlowFiles together into a "merged" FlowFile. There are a few criteria that are used to make a determination as
|
when a bin is "full" so that we can bin those FlowFiles together into a "merged" FlowFile.
|
||||||
to whether or not a bin should be merged.
|
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
<p>
|
<p>
|
||||||
@ -112,7 +111,7 @@
|
|||||||
If the <Merge Strategy> property is set to "Defragment", then a bin is full only when the number of FlowFiles in the bin is equal to the number specified
|
If the <Merge Strategy> property is set to "Defragment", then a bin is full only when the number of FlowFiles in the bin is equal to the number specified
|
||||||
by the "fragment.count" attribute of one of the FlowFiles in the bin. All FlowFiles that have this attribute must have the same value for this attribute,
|
by the "fragment.count" attribute of one of the FlowFiles in the bin. All FlowFiles that have this attribute must have the same value for this attribute,
|
||||||
or else they will be routed to the "failure" relationship. It is not necessary that all FlowFiles have this value, but at least one FlowFile in the bin must have
|
or else they will be routed to the "failure" relationship. It is not necessary that all FlowFiles have this value, but at least one FlowFile in the bin must have
|
||||||
this value or the bin will never be complete. If all of the necessary FlowFiles are not binned together by the point at which the bin times amount
|
this value or the bin will never be complete. If all the necessary FlowFiles are not binned together by the point at which the bin times amount
|
||||||
(as specified by the <Max Bin Age> property), then the FlowFiles will all be routed to the 'failure' relationship instead of being merged together.
|
(as specified by the <Max Bin Age> property), then the FlowFiles will all be routed to the 'failure' relationship instead of being merged together.
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
@ -150,7 +149,7 @@
|
|||||||
Bin has not yet reached either of the minimum thresholds. Note that the age here is determined by when the Bin was created, NOT the age of the FlowFiles that reside within those
|
Bin has not yet reached either of the minimum thresholds. Note that the age here is determined by when the Bin was created, NOT the age of the FlowFiles that reside within those
|
||||||
Bins. As a result, if the Processor is stopped until it has 1 million FlowFiles queued, each one being 10 days old, but the Max Bin Age is set to "1 day," the Max Bin Age will not
|
Bins. As a result, if the Processor is stopped until it has 1 million FlowFiles queued, each one being 10 days old, but the Max Bin Age is set to "1 day," the Max Bin Age will not
|
||||||
be met for at least one full day, even though the FlowFiles themselves are much older than this threshold. If the Processor is stopped and restarted, all Bins are destroyed and
|
be met for at least one full day, even though the FlowFiles themselves are much older than this threshold. If the Processor is stopped and restarted, all Bins are destroyed and
|
||||||
recreated, so the timer is reset.</td>
|
recreated, and the timer is reset.</td>
|
||||||
</tr>
|
</tr>
|
||||||
<tr>
|
<tr>
|
||||||
<td>BIN_MANAGER_FULL</td>
|
<td>BIN_MANAGER_FULL</td>
|
||||||
|
@ -50,8 +50,8 @@
|
|||||||
<p>
|
<p>
|
||||||
How the Processor determines which bin to place a FlowFile in depends on a few different configuration options. Firstly, the Merge Strategy
|
How the Processor determines which bin to place a FlowFile in depends on a few different configuration options. Firstly, the Merge Strategy
|
||||||
is considered. The Merge Strategy can be set to one of two options: Bin Packing Algorithm, or Defragment. When the goal is to simply combine
|
is considered. The Merge Strategy can be set to one of two options: Bin Packing Algorithm, or Defragment. When the goal is to simply combine
|
||||||
smaller FlowFiles into one larger FlowFiles, the Bin Packing Algorithm should be used. This algorithm picks a bin based on whether or not the FlowFile
|
smaller FlowFiles into one larger FlowFiles, the Bin Packing Algorithm should be used. This algorithm picks a bin based on whether the FlowFile
|
||||||
can fit in the bin according to its size and the <Maximum Bin Size> property and whether or not the FlowFile is 'like' the other FlowFiles in
|
can fit in the bin according to its size and the <Maximum Bin Size> property and whether the FlowFile is 'like' the other FlowFiles in
|
||||||
the bin. What it means for two FlowFiles to be 'like FlowFiles' is discussed at the end of this section.
|
the bin. What it means for two FlowFiles to be 'like FlowFiles' is discussed at the end of this section.
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
@ -87,12 +87,11 @@
|
|||||||
<h3>When a Bin is Merged</h3>
|
<h3>When a Bin is Merged</h3>
|
||||||
<p>
|
<p>
|
||||||
Above, we discussed how a bin is chosen for a given FlowFile. Once a bin has been created and FlowFiles added to it, we must have some way to determine
|
Above, we discussed how a bin is chosen for a given FlowFile. Once a bin has been created and FlowFiles added to it, we must have some way to determine
|
||||||
when a bin is "full" so that we can bin those FlowFiles together into a "merged" FlowFile. There are a few criteria that are used to make a determination as
|
when a bin is "full" so that we can bin those FlowFiles together into a "merged" FlowFile.
|
||||||
to whether or not a bin should be merged.
|
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
<p>
|
<p>
|
||||||
If the <Merge Strategy> property is set to "Bin Packing Algorithm" then then the following rules will be evaluated.
|
If the <Merge Strategy> property is set to "Bin Packing Algorithm" then the following rules will be evaluated.
|
||||||
Firstly, in order for a bin to be full, both of the thresholds specified by the <Minimum Bin Size> and the <Minimum Number of Records> properties
|
Firstly, in order for a bin to be full, both of the thresholds specified by the <Minimum Bin Size> and the <Minimum Number of Records> properties
|
||||||
must be satisfied. If one of these properties is not set, then it is ignored. Secondly, if either the <Maximum Bin Size> or the <Maximum Number of
|
must be satisfied. If one of these properties is not set, then it is ignored. Secondly, if either the <Maximum Bin Size> or the <Maximum Number of
|
||||||
Records> property is reached, then the bin is merged. That is, both of the minimum values must be reached but only one of the maximum values need be reached.
|
Records> property is reached, then the bin is merged. That is, both of the minimum values must be reached but only one of the maximum values need be reached.
|
||||||
@ -109,7 +108,7 @@
|
|||||||
If the <Merge Strategy> property is set to "Defragment" then a bin is full only when the number of FlowFiles in the bin is equal to the number specified
|
If the <Merge Strategy> property is set to "Defragment" then a bin is full only when the number of FlowFiles in the bin is equal to the number specified
|
||||||
by the "fragment.count" attribute of one of the FlowFiles in the bin. All FlowFiles that have this attribute must have the same value for this attribute,
|
by the "fragment.count" attribute of one of the FlowFiles in the bin. All FlowFiles that have this attribute must have the same value for this attribute,
|
||||||
or else they will be routed to the "failure" relationship. It is not necessary that all FlowFiles have this value, but at least one FlowFile in the bin must have
|
or else they will be routed to the "failure" relationship. It is not necessary that all FlowFiles have this value, but at least one FlowFile in the bin must have
|
||||||
this value or the bin will never be complete. If all of the necessary FlowFiles are not binned together by the point at which the bin times amount
|
this value or the bin will never be complete. If all the necessary FlowFiles are not binned together by the point at which the bin times amount
|
||||||
(as specified by the <Max Bin Age> property), then the FlowFiles will all be routed to the 'failure' relationship instead of being merged together.
|
(as specified by the <Max Bin Age> property), then the FlowFiles will all be routed to the 'failure' relationship instead of being merged together.
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
@ -117,7 +116,7 @@
|
|||||||
Once a bin is merged into a single FlowFile, it can sometimes be useful to understand why exactly the bin was merged when it was. For example, if the maximum number
|
Once a bin is merged into a single FlowFile, it can sometimes be useful to understand why exactly the bin was merged when it was. For example, if the maximum number
|
||||||
of allowable bins is reached, a merged FlowFile may consist of far fewer records than expected. In order to help understand the behavior, the Processor will emit
|
of allowable bins is reached, a merged FlowFile may consist of far fewer records than expected. In order to help understand the behavior, the Processor will emit
|
||||||
a JOIN Provenance Events when creating the merged FlowFile, and the JOIN event will include in it a "Details" field that explains why the bin was merged when it was.
|
a JOIN Provenance Events when creating the merged FlowFile, and the JOIN event will include in it a "Details" field that explains why the bin was merged when it was.
|
||||||
For example, the event will indicate "Records Merged due to: Bin is full" if the bin reached its minimum thresholds and no more subsequent FlowFiles were able to be
|
For example, the event will indicate "Records Merged due to: Bin is full" if the bin reached its minimum thresholds and no more subsequent FlowFiles were
|
||||||
added to it. Or it may indicate "Records Merged due to: Maximum number of bins has been exceeded" if the bin was merged due to the configured maximum number of bins
|
added to it. Or it may indicate "Records Merged due to: Maximum number of bins has been exceeded" if the bin was merged due to the configured maximum number of bins
|
||||||
being filled and needing to free up space for a new bin.
|
being filled and needing to free up space for a new bin.
|
||||||
</p>
|
</p>
|
||||||
@ -125,8 +124,8 @@
|
|||||||
|
|
||||||
<h3>When a Failure Occurs</h3>
|
<h3>When a Failure Occurs</h3>
|
||||||
<p>
|
<p>
|
||||||
When a bin is filled, the Processor is responsible for merging together all of the records in those FlowFiles into a single FlowFile. If the Processor fails
|
When a bin is filled, the Processor is responsible for merging together all the records in those FlowFiles into a single FlowFile. If the Processor fails
|
||||||
to do so for any reason (for example, a Record cannot be read from an input FlowFile), then all of the FlowFiles in that bin are routed to the 'failure'
|
to do so for any reason (for example, a Record cannot be read from an input FlowFile), then all the FlowFiles in that bin are routed to the 'failure'
|
||||||
Relationship. The Processor does not skip the single problematic FlowFile and merge the others. This behavior was chosen because of two different considerations.
|
Relationship. The Processor does not skip the single problematic FlowFile and merge the others. This behavior was chosen because of two different considerations.
|
||||||
Firstly, without those problematic records, the bin may not truly be full, as the minimum bin size may not be reached without those records.
|
Firstly, without those problematic records, the bin may not truly be full, as the minimum bin size may not be reached without those records.
|
||||||
Secondly, and more importantly, if the problematic FlowFile contains 100 "good" records before the problematic ones, those 100 records would already have been
|
Secondly, and more importantly, if the problematic FlowFile contains 100 "good" records before the problematic ones, those 100 records would already have been
|
||||||
@ -205,7 +204,7 @@
|
|||||||
<p>
|
<p>
|
||||||
In this, because we have not configured a Correlation Attribute, and because all FlowFiles have the same schema, the Processor
|
In this, because we have not configured a Correlation Attribute, and because all FlowFiles have the same schema, the Processor
|
||||||
will attempt to add all of these FlowFiles to the same bin. Because the Minimum Number of Records is 3 and the Maximum Number of Records is 5,
|
will attempt to add all of these FlowFiles to the same bin. Because the Minimum Number of Records is 3 and the Maximum Number of Records is 5,
|
||||||
all of the FlowFiles will be added to the same bin. The output, then, is a single FlowFile with the following content:
|
all the FlowFiles will be added to the same bin. The output, then, is a single FlowFile with the following content:
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
<code>
|
<code>
|
||||||
@ -219,7 +218,7 @@ Jan, 2
|
|||||||
</code>
|
</code>
|
||||||
|
|
||||||
<p>
|
<p>
|
||||||
When the Processor runs, it will bin all of the FlowFiles that it can get from the queue. After that, it will merge any bin that is "full enough."
|
When the Processor runs, it will bin all the FlowFiles that it can get from the queue. After that, it will merge any bin that is "full enough."
|
||||||
So if we had only 3 FlowFiles on the queue, those 3 would have been added, and a new bin would have been created in the next iteration, once the
|
So if we had only 3 FlowFiles on the queue, those 3 would have been added, and a new bin would have been created in the next iteration, once the
|
||||||
4th FlowFile showed up. However, if we had 8 FlowFiles queued up, only 5 would have been added to the first bin. The other 3 would have been added
|
4th FlowFile showed up. However, if we had 8 FlowFiles queued up, only 5 would have been added to the first bin. The other 3 would have been added
|
||||||
to a second bin, and that bin would then be merged since it reached the minimum threshold of 3 also.
|
to a second bin, and that bin would then be merged since it reached the minimum threshold of 3 also.
|
||||||
|
@ -32,7 +32,7 @@
|
|||||||
In order to make the Processor valid, at least one user-defined property must be added to the Processor.
|
In order to make the Processor valid, at least one user-defined property must be added to the Processor.
|
||||||
The value of the property must be a valid RecordPath. Expression Language is supported and will be evaluated before
|
The value of the property must be a valid RecordPath. Expression Language is supported and will be evaluated before
|
||||||
attempting to compile the RecordPath. However, if Expression Language is used, the Processor is not able to validate
|
attempting to compile the RecordPath. However, if Expression Language is used, the Processor is not able to validate
|
||||||
the RecordPath before-hand and may result in having FlowFiles fail processing if the RecordPath is not valid when being
|
the RecordPath beforehand and may result in having FlowFiles fail processing if the RecordPath is not valid when being
|
||||||
used.
|
used.
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
@ -46,7 +46,7 @@
|
|||||||
</p>
|
</p>
|
||||||
|
|
||||||
<p>
|
<p>
|
||||||
Once a FlowFile has been written, we know that all of the Records within that FlowFile have the same value for the fields that are
|
Once a FlowFile has been written, we know that all the Records within that FlowFile have the same value for the fields that are
|
||||||
described by the configured RecordPath's. As a result, this means that we can promote those values to FlowFile Attributes. We do so
|
described by the configured RecordPath's. As a result, this means that we can promote those values to FlowFile Attributes. We do so
|
||||||
by looking at the name of the property to which each RecordPath belongs. For example, if we have a property named <code>country</code>
|
by looking at the name of the property to which each RecordPath belongs. For example, if we have a property named <code>country</code>
|
||||||
with a value of <code>/geo/country/name</code>, then each outbound FlowFile will have an attribute named <code>country</code> with the
|
with a value of <code>/geo/country/name</code>, then each outbound FlowFile will have an attribute named <code>country</code> with the
|
||||||
@ -142,7 +142,7 @@
|
|||||||
<h3>Example 1 - Partition By Simple Field</h3>
|
<h3>Example 1 - Partition By Simple Field</h3>
|
||||||
|
|
||||||
<p>
|
<p>
|
||||||
For a simple case, let's partition all of the records based on the state that they live in.
|
For a simple case, let's partition all the records based on the state that they live in.
|
||||||
We can add a property named <code>state</code> with a value of <code>/locations/home/state</code>.
|
We can add a property named <code>state</code> with a value of <code>/locations/home/state</code>.
|
||||||
The result will be that we will have two outbound FlowFiles. The first will contain an attribute with the name
|
The result will be that we will have two outbound FlowFiles. The first will contain an attribute with the name
|
||||||
<code>state</code> and a value of <code>NY</code>. This FlowFile will consist of 3 records: John Doe, Jane Doe, and Jacob Doe.
|
<code>state</code> and a value of <code>NY</code>. This FlowFile will consist of 3 records: John Doe, Jane Doe, and Jacob Doe.
|
||||||
@ -174,7 +174,7 @@
|
|||||||
</p>
|
</p>
|
||||||
|
|
||||||
<p>
|
<p>
|
||||||
This will result in three different FlowFiles being created. The first FlowFile will contain records for John Doe and Jane Doe. If will contain an attribute
|
This will result in three different FlowFiles being created. The first FlowFile will contain records for John Doe and Jane Doe. It will contain an attribute
|
||||||
named "favorite.food" with a value of "spaghetti." However, because the second RecordPath pointed to a Record field, no "home" attribute will be added.
|
named "favorite.food" with a value of "spaghetti." However, because the second RecordPath pointed to a Record field, no "home" attribute will be added.
|
||||||
In this case, both of these records have the same value for both the first element of the "favorites" array
|
In this case, both of these records have the same value for both the first element of the "favorites" array
|
||||||
and the same value for the home address. Janet Doe has the same value for the first element in the "favorites" array but has a different home address. Similarly,
|
and the same value for the home address. Janet Doe has the same value for the first element in the "favorites" array but has a different home address. Similarly,
|
||||||
|
@ -132,14 +132,14 @@
|
|||||||
It is also worth noting that the outbound FlowFiles have two different schemas. The <code>Engineers</code> and <code>Younger Than Average</code> FlowFiles contain 3 fields:
|
It is also worth noting that the outbound FlowFiles have two different schemas. The <code>Engineers</code> and <code>Younger Than Average</code> FlowFiles contain 3 fields:
|
||||||
<code>name</code>, <code>age</code>, and <code>title</code> while the <code>VP</code> FlowFile contains only the <code>name</code> field. In most cases, the Record Writer is configured to
|
<code>name</code>, <code>age</code>, and <code>title</code> while the <code>VP</code> FlowFile contains only the <code>name</code> field. In most cases, the Record Writer is configured to
|
||||||
use whatever Schema is provided to it by the Record (this generally means that it is configured with a <code>Schema Access Strategy</code> of <code>Inherit Record Schema</code>). In such
|
use whatever Schema is provided to it by the Record (this generally means that it is configured with a <code>Schema Access Strategy</code> of <code>Inherit Record Schema</code>). In such
|
||||||
a case, this works well. However, if a Schema is supplied to the Record Writer explicitly, it is important to ensure that the Schema accounts for all fields. If not, then then the
|
a case, this works well. However, if a Schema is supplied to the Record Writer explicitly, it is important to ensure that the Schema accounts for all fields. If not, then the
|
||||||
fields that are missing from the Record Writer's schema will simply not be present in the output.
|
fields that are missing from the Record Writer's schema will simply not be present in the output.
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
|
|
||||||
<h3>SQL Over Hierarchical Data</h3>
|
<h3>SQL Over Hierarchical Data</h3>
|
||||||
<p>
|
<p>
|
||||||
One important detail that we must taken into account when evaluating SQL over streams of arbitrary data is how
|
One important detail that we must take into account when evaluating SQL over streams of arbitrary data is how
|
||||||
we can handle hierarchical data, such as JSON, XML, and Avro. Because SQL was developed originally for relational databases, which
|
we can handle hierarchical data, such as JSON, XML, and Avro. Because SQL was developed originally for relational databases, which
|
||||||
represent "flat" data, it is easy to understand how this would map to other "flat" data like a CSV file. Or even
|
represent "flat" data, it is easy to understand how this would map to other "flat" data like a CSV file. Or even
|
||||||
a "flat" JSON representation where all fields are primitive types. However, in many cases, users encounter cases where they would like to evaluate SQL
|
a "flat" JSON representation where all fields are primitive types. However, in many cases, users encounter cases where they would like to evaluate SQL
|
||||||
|
@ -44,7 +44,7 @@
|
|||||||
<h3>Modes</h3>
|
<h3>Modes</h3>
|
||||||
<p>
|
<p>
|
||||||
This processor is used to tail a file or multiple files, depending on the configured mode. The
|
This processor is used to tail a file or multiple files, depending on the configured mode. The
|
||||||
mode to choose depends of the logging pattern followed by the file(s) to tail. In any case, if there
|
mode to choose depends on the logging pattern followed by the file(s) to tail. In any case, if there
|
||||||
is a rolling pattern, the rolling files must be plain text files (compression is not supported at
|
is a rolling pattern, the rolling files must be plain text files (compression is not supported at
|
||||||
the moment).
|
the moment).
|
||||||
</p>
|
</p>
|
||||||
@ -171,7 +171,7 @@
|
|||||||
</p>
|
</p>
|
||||||
|
|
||||||
<p>
|
<p>
|
||||||
Additionally, we run the chance of the Regular Expression not matching the data in the file. This could result in buffering all of the file's content, which could cause NiFi
|
Additionally, we run the chance of the Regular Expression not matching the data in the file. This could result in buffering all the file's content, which could cause NiFi
|
||||||
to run out of memory. To avoid this, the <Max Buffer Size> property limits the amount of data that can be buffered. If this amount of data is buffered, it will be flushed
|
to run out of memory. To avoid this, the <Max Buffer Size> property limits the amount of data that can be buffered. If this amount of data is buffered, it will be flushed
|
||||||
to the FlowFile, even if another message hasn't been encountered.
|
to the FlowFile, even if another message hasn't been encountered.
|
||||||
</p>
|
</p>
|
||||||
|
@ -51,7 +51,7 @@
|
|||||||
</p>
|
</p>
|
||||||
|
|
||||||
<p>
|
<p>
|
||||||
Below, we lay out some examples in order to provide clarity about the Processor's behavior. For all of
|
Below, we lay out some examples in order to provide clarity about the Processor's behavior. For all
|
||||||
the examples below, consider the example to operate on the following set of 2 (JSON) records:
|
the examples below, consider the example to operate on the following set of 2 (JSON) records:
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
@ -210,7 +210,7 @@
|
|||||||
|
|
||||||
<p>
|
<p>
|
||||||
In the above example, we replaced the value of field based on another RecordPath. That RecordPath was an "absolute RecordPath,"
|
In the above example, we replaced the value of field based on another RecordPath. That RecordPath was an "absolute RecordPath,"
|
||||||
meaning that it starts with a "slash" character (<code>/</code>) and therefore it specifies the path from the "root" or "outer most" element.
|
meaning that it starts with a "slash" character (<code>/</code>) and therefore it specifies the path from the "root" or "outermost" element.
|
||||||
However, sometimes we want to reference a field in such a way that we defined the RecordPath relative to the field being updated. This example
|
However, sometimes we want to reference a field in such a way that we defined the RecordPath relative to the field being updated. This example
|
||||||
does just that. For each of the siblings given in the "siblings" array, we will replace the sibling's name with their id's. To do so, we will
|
does just that. For each of the siblings given in the "siblings" array, we will replace the sibling's name with their id's. To do so, we will
|
||||||
configure the processor with the following properties:
|
configure the processor with the following properties:
|
||||||
|
@ -95,7 +95,7 @@
|
|||||||
<ul>
|
<ul>
|
||||||
<b>Schema property:</b> Unique(), UniqueHashCode()<br />
|
<b>Schema property:</b> Unique(), UniqueHashCode()<br />
|
||||||
<b>Meaning:</b> the input CSV has two columns. All the values of the first column must be unique (all the values are stored in
|
<b>Meaning:</b> the input CSV has two columns. All the values of the first column must be unique (all the values are stored in
|
||||||
memory and this can be consuming depending of the input). All the values of the second column must be unique (only hash
|
memory). All the values of the second column must be unique (only hash
|
||||||
codes of the input values are stored to ensure uniqueness).
|
codes of the input values are stored to ensure uniqueness).
|
||||||
</ul>
|
</ul>
|
||||||
|
|
||||||
|
@ -55,7 +55,7 @@
|
|||||||
</ul>
|
</ul>
|
||||||
|
|
||||||
<p>
|
<p>
|
||||||
If may be convenient to redirect the logging output of this ReportingTask to a separate log file than the typical application log.
|
It may be convenient to redirect the logging output of this ReportingTask to a separate log file than the typical application log.
|
||||||
This can be accomplished by modified the logback.xml file in the NiFi conf/ directory such that a logger with the name
|
This can be accomplished by modified the logback.xml file in the NiFi conf/ directory such that a logger with the name
|
||||||
<code>org.apache.nifi.controller.ControllerStatusReportingTask</code> is configured to write to a separate log.
|
<code>org.apache.nifi.controller.ControllerStatusReportingTask</code> is configured to write to a separate log.
|
||||||
</p>
|
</p>
|
||||||
|
@ -103,7 +103,7 @@ Oct 12 04:16:11 localhost CEF:0|Company|Product|1.2.3|audit-login|Successful log
|
|||||||
|
|
||||||
<p>
|
<p>
|
||||||
A common concern when inferring schemas is how to handle the condition of two values that have different types. For example, a custom extension field might
|
A common concern when inferring schemas is how to handle the condition of two values that have different types. For example, a custom extension field might
|
||||||
have a Float value in one record and String in an other. In these cases, the inferred will contain a CHOICE data type with FLOAT and STRING options. Records will
|
have a Float value in one record and String in another. In these cases, the inferred will contain a CHOICE data type with FLOAT and STRING options. Records will
|
||||||
be allowed to have either value for the particular field.
|
be allowed to have either value for the particular field.
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
@ -111,7 +111,7 @@ Oct 12 04:16:11 localhost CEF:0|Company|Product|1.2.3|audit-login|Successful log
|
|||||||
CEF format comes with specification not only to the message format but also has directives for the content. Because of this, the data type of some
|
CEF format comes with specification not only to the message format but also has directives for the content. Because of this, the data type of some
|
||||||
fields are not determined by the actual value(s) in the FlowFile but by the CEF format. This includes header fields, which always have to appear and
|
fields are not determined by the actual value(s) in the FlowFile but by the CEF format. This includes header fields, which always have to appear and
|
||||||
comply to the data types defined in the CEF format. Also, extension fields from the Extension Dictionary might or might not appear in the generated
|
comply to the data types defined in the CEF format. Also, extension fields from the Extension Dictionary might or might not appear in the generated
|
||||||
schema based on the FlowFile content but in case an extension field is added it's data type is bound by the CEF format. Custom extensions have no similar
|
schema based on the FlowFile content but in case an extension field is added its data type is bound by the CEF format. Custom extensions have no similar
|
||||||
restrictions, their presence in the schema is completely depending on the FlowFile content.
|
restrictions, their presence in the schema is completely depending on the FlowFile content.
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
|
@ -108,10 +108,10 @@ Jane, Ten
|
|||||||
When two values are encountered for the same field in two different records (or two values are encountered for an ARRAY type), the inference engine prefers
|
When two values are encountered for the same field in two different records (or two values are encountered for an ARRAY type), the inference engine prefers
|
||||||
to use a "wider" data type over using a CHOICE data type. A data type "A" is said to be wider than data type "B" if and only if data type "A" encompasses all
|
to use a "wider" data type over using a CHOICE data type. A data type "A" is said to be wider than data type "B" if and only if data type "A" encompasses all
|
||||||
values of "B" in addition to other values. For example, the LONG type is wider than the INT type but not wider than the BOOLEAN type (and BOOLEAN is also not wider
|
values of "B" in addition to other values. For example, the LONG type is wider than the INT type but not wider than the BOOLEAN type (and BOOLEAN is also not wider
|
||||||
than LONG). INT is wider than SHORT. The STRING type is considered wider than all other types with the Exception of MAP, RECORD, ARRAY, and CHOICE.
|
than LONG). INT is wider than SHORT. The STRING type is considered wider than all other types except MAP, RECORD, ARRAY, and CHOICE.
|
||||||
</li>
|
</li>
|
||||||
<li>
|
<li>
|
||||||
Before inferring the type of a value, leading and trailing whitespace are removed. Additionally, if the value is surrounded by double-quotes ("), the double-quotes
|
Before inferring the type of value, leading and trailing whitespace are removed. Additionally, if the value is surrounded by double-quotes ("), the double-quotes
|
||||||
are removed. Therefore, the value <code>16</code> is interpreted the same as <code> "16"</code>. Both will be interpreted as an INT. However, the value
|
are removed. Therefore, the value <code>16</code> is interpreted the same as <code> "16"</code>. Both will be interpreted as an INT. However, the value
|
||||||
<code>" 16"</code> will be inferred as a STRING type because the white space is enclosed within double-quotes, which means that the white space is considered
|
<code>" 16"</code> will be inferred as a STRING type because the white space is enclosed within double-quotes, which means that the white space is considered
|
||||||
part of the value.
|
part of the value.
|
||||||
|
@ -30,7 +30,7 @@
|
|||||||
a file that contains Grok Patterns that can be used for parsing log data. If not specified, a default
|
a file that contains Grok Patterns that can be used for parsing log data. If not specified, a default
|
||||||
patterns file will be used. Its contents are provided below. There are also properties for specifying
|
patterns file will be used. Its contents are provided below. There are also properties for specifying
|
||||||
the schema to use when parsing data. The schema is not required. However, when data is parsed
|
the schema to use when parsing data. The schema is not required. However, when data is parsed
|
||||||
a Record is created that contains all of the fields present in the Grok Expression (explained below),
|
a Record is created that contains all the fields present in the Grok Expression (explained below),
|
||||||
and all fields are of type String. If a schema is chosen, the field can be declared to be a different,
|
and all fields are of type String. If a schema is chosen, the field can be declared to be a different,
|
||||||
compatible type, such as number. Additionally, if the schema does not contain one of the fields in the
|
compatible type, such as number. Additionally, if the schema does not contain one of the fields in the
|
||||||
parsed data, that field will be ignored. This can be used to filter out fields that are not of interest.
|
parsed data, that field will be ignored. This can be used to filter out fields that are not of interest.
|
||||||
|
@ -46,7 +46,7 @@
|
|||||||
<p>
|
<p>
|
||||||
This Controller Service must be configured with a schema. Each JSON Path that is evaluated and is found in the "root level"
|
This Controller Service must be configured with a schema. Each JSON Path that is evaluated and is found in the "root level"
|
||||||
of the schema will produce a Field in the Record. I.e., the schema should match the Record that is created by evaluating all
|
of the schema will produce a Field in the Record. I.e., the schema should match the Record that is created by evaluating all
|
||||||
of the JSON Paths. It should not match the "incoming JSON" that is read from the FlowFile.
|
the JSON Paths. It should not match the "incoming JSON" that is read from the FlowFile.
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
|
|
||||||
@ -130,7 +130,7 @@
|
|||||||
When two values are encountered for the same field in two different records (or two values are encountered for an ARRAY type), the inference engine prefers
|
When two values are encountered for the same field in two different records (or two values are encountered for an ARRAY type), the inference engine prefers
|
||||||
to use a "wider" data type over using a CHOICE data type. A data type "A" is said to be wider than data type "B" if and only if data type "A" encompasses all
|
to use a "wider" data type over using a CHOICE data type. A data type "A" is said to be wider than data type "B" if and only if data type "A" encompasses all
|
||||||
values of "B" in addition to other values. For example, the LONG type is wider than the INT type but not wider than the BOOLEAN type (and BOOLEAN is also not wider
|
values of "B" in addition to other values. For example, the LONG type is wider than the INT type but not wider than the BOOLEAN type (and BOOLEAN is also not wider
|
||||||
than LONG). INT is wider than SHORT. The STRING type is considered wider than all other types with the Exception of MAP, RECORD, ARRAY, and CHOICE.
|
than LONG). INT is wider than SHORT. The STRING type is considered wider than all other types except MAP, RECORD, ARRAY, and CHOICE.
|
||||||
</li>
|
</li>
|
||||||
<li>
|
<li>
|
||||||
If two values are encountered for the same field in two different records (or two values are encountered for an ARRAY type), but neither value is of a type that
|
If two values are encountered for the same field in two different records (or two values are encountered for an ARRAY type), but neither value is of a type that
|
||||||
|
@ -118,7 +118,7 @@
|
|||||||
When two values are encountered for the same field in two different records (or two values are encountered for an ARRAY type), the inference engine prefers
|
When two values are encountered for the same field in two different records (or two values are encountered for an ARRAY type), the inference engine prefers
|
||||||
to use a "wider" data type over using a CHOICE data type. A data type "A" is said to be wider than data type "B" if and only if data type "A" encompasses all
|
to use a "wider" data type over using a CHOICE data type. A data type "A" is said to be wider than data type "B" if and only if data type "A" encompasses all
|
||||||
values of "B" in addition to other values. For example, the LONG type is wider than the INT type but not wider than the BOOLEAN type (and BOOLEAN is also not wider
|
values of "B" in addition to other values. For example, the LONG type is wider than the INT type but not wider than the BOOLEAN type (and BOOLEAN is also not wider
|
||||||
than LONG). INT is wider than SHORT. The STRING type is considered wider than all other types with the Exception of MAP, RECORD, ARRAY, and CHOICE.
|
than LONG). INT is wider than SHORT. The STRING type is considered wider than all other types except MAP, RECORD, ARRAY, and CHOICE.
|
||||||
</li>
|
</li>
|
||||||
<li>
|
<li>
|
||||||
If two values are encountered for the same field in two different records (or two values are encountered for an ARRAY type), but neither value is of a type that
|
If two values are encountered for the same field in two different records (or two values are encountered for an ARRAY type), but neither value is of a type that
|
||||||
|
@ -75,7 +75,7 @@
|
|||||||
</code>
|
</code>
|
||||||
|
|
||||||
<p>
|
<p>
|
||||||
This record can be described by a schema containing one field (e. g. of type string). By providing this schema,
|
This record can be described by a schema containing one field (e.g. of type string). By providing this schema,
|
||||||
the reader expects zero or one occurrences of "simple_field" in the record.
|
the reader expects zero or one occurrences of "simple_field" in the record.
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
@ -584,7 +584,7 @@
|
|||||||
The "Field Name for Content" property is not set, and the XML element has a sub-element named "value". The name of the sub-element clashes with the
|
The "Field Name for Content" property is not set, and the XML element has a sub-element named "value". The name of the sub-element clashes with the
|
||||||
default field name added to the schema by the Schema Inference logic (see Example 2). As seen in the output data, the input XML attribute's value
|
default field name added to the schema by the Schema Inference logic (see Example 2). As seen in the output data, the input XML attribute's value
|
||||||
is added to the record just like in the previous examples. The value of the <code><value></code> element is retained, but the content of the
|
is added to the record just like in the previous examples. The value of the <code><value></code> element is retained, but the content of the
|
||||||
<code><field_with_attribute></code> that was outside of the sub-element, is lost.
|
<code><field_with_attribute></code> that was outside the sub-element, is lost.
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
<h3>XML Attributes and Schema Inference Example 5</h3>
|
<h3>XML Attributes and Schema Inference Example 5</h3>
|
||||||
@ -907,7 +907,7 @@
|
|||||||
When two values are encountered for the same field in two different records (or two values are encountered for an ARRAY type), the inference engine prefers
|
When two values are encountered for the same field in two different records (or two values are encountered for an ARRAY type), the inference engine prefers
|
||||||
to use a "wider" data type over using a CHOICE data type. A data type "A" is said to be wider than data type "B" if and only if data type "A" encompasses all
|
to use a "wider" data type over using a CHOICE data type. A data type "A" is said to be wider than data type "B" if and only if data type "A" encompasses all
|
||||||
values of "B" in addition to other values. For example, the LONG type is wider than the INT type but not wider than the BOOLEAN type (and BOOLEAN is also not wider
|
values of "B" in addition to other values. For example, the LONG type is wider than the INT type but not wider than the BOOLEAN type (and BOOLEAN is also not wider
|
||||||
than LONG). INT is wider than SHORT. The STRING type is considered wider than all other types with the Exception of MAP, RECORD, ARRAY, and CHOICE.
|
than LONG). INT is wider than SHORT. The STRING type is considered wider than all other types except MAP, RECORD, ARRAY, and CHOICE.
|
||||||
</li>
|
</li>
|
||||||
<li>
|
<li>
|
||||||
If two values are encountered for the same field in two different records (or two values are encountered for an ARRAY type), but neither value is of a type that
|
If two values are encountered for the same field in two different records (or two values are encountered for an ARRAY type), but neither value is of a type that
|
||||||
|
@ -118,7 +118,7 @@
|
|||||||
When two values are encountered for the same field in two different records (or two values are encountered for an ARRAY type), the inference engine prefers
|
When two values are encountered for the same field in two different records (or two values are encountered for an ARRAY type), the inference engine prefers
|
||||||
to use a "wider" data type over using a CHOICE data type. A data type "A" is said to be wider than data type "B" if and only if data type "A" encompasses all
|
to use a "wider" data type over using a CHOICE data type. A data type "A" is said to be wider than data type "B" if and only if data type "A" encompasses all
|
||||||
values of "B" in addition to other values. For example, the LONG type is wider than the INT type but not wider than the BOOLEAN type (and BOOLEAN is also not wider
|
values of "B" in addition to other values. For example, the LONG type is wider than the INT type but not wider than the BOOLEAN type (and BOOLEAN is also not wider
|
||||||
than LONG). INT is wider than SHORT. The STRING type is considered wider than all other types with the Exception of MAP, RECORD, ARRAY, and CHOICE.
|
than LONG). INT is wider than SHORT. The STRING type is considered wider than all other types except MAP, RECORD, ARRAY, and CHOICE.
|
||||||
</li>
|
</li>
|
||||||
<li>
|
<li>
|
||||||
If two values are encountered for the same field in two different records (or two values are encountered for an ARRAY type), but neither value is of a type that
|
If two values are encountered for the same field in two different records (or two values are encountered for an ARRAY type), but neither value is of a type that
|
||||||
|
@ -342,7 +342,7 @@
|
|||||||
|
|
||||||
In the event that the processor is unable to get the state at the beginning of the onTrigger, the FlowFile will be pushed back to the originating relationship and the processor will yield.
|
In the event that the processor is unable to get the state at the beginning of the onTrigger, the FlowFile will be pushed back to the originating relationship and the processor will yield.
|
||||||
If the processor is able to get the state at the beginning of the onTrigger but unable to set the state after adding attributes to the FlowFile, the FlowFile will be transferred to
|
If the processor is able to get the state at the beginning of the onTrigger but unable to set the state after adding attributes to the FlowFile, the FlowFile will be transferred to
|
||||||
"set state fail". This is normally due to the state not being the most up to date version (another thread has replaced the state with another version). In most use-cases this relationship
|
"set state fail". This is normally due to the state not being the most recent version (another thread has replaced the state with another version). In most use-cases this relationship
|
||||||
should loop back to the processor since the only affected attributes will be overwritten.
|
should loop back to the processor since the only affected attributes will be overwritten.
|
||||||
|
|
||||||
Note: Currently the only "stateful" option is to store state locally. This is done because the current implementation of clustered state relies on Zookeeper and Zookeeper isn't designed
|
Note: Currently the only "stateful" option is to store state locally. This is done because the current implementation of clustered state relies on Zookeeper and Zookeeper isn't designed
|
||||||
@ -367,7 +367,7 @@
|
|||||||
<strong>Notes about Concurrency and Stateful Usage</strong></p>
|
<strong>Notes about Concurrency and Stateful Usage</strong></p>
|
||||||
<p>
|
<p>
|
||||||
When using the stateful option, concurrent tasks should be used with caution. If every incoming FlowFile will update state then it will be much more efficient to have only one
|
When using the stateful option, concurrent tasks should be used with caution. If every incoming FlowFile will update state then it will be much more efficient to have only one
|
||||||
task. This is because the first thing the onTrigger does is get the state and the last thing it does is store the state if there are an updates. If it does not have the most up to date
|
task. This is because the first thing the onTrigger does is get the state and the last thing it does is store the state if there are an updates. If it does not have the most recent
|
||||||
initial state when it goes to update it will fail and send the FlowFile to "set state fail". This is done so that the update is successful when it was done with the most recent information.
|
initial state when it goes to update it will fail and send the FlowFile to "set state fail". This is done so that the update is successful when it was done with the most recent information.
|
||||||
If it didn't do it in this mock-atomic way, there'd be no guarantee that the state is accurate.
|
If it didn't do it in this mock-atomic way, there'd be no guarantee that the state is accurate.
|
||||||
|
|
||||||
|
@ -208,7 +208,7 @@ public abstract class AbstractComponentNode implements ComponentNode {
|
|||||||
* configured set of properties
|
* configured set of properties
|
||||||
*/
|
*/
|
||||||
protected boolean isClasspathDifferent(final Map<PropertyDescriptor, String> properties) {
|
protected boolean isClasspathDifferent(final Map<PropertyDescriptor, String> properties) {
|
||||||
// If any property in the given map modifies classpath and is different than the currently configured value,
|
// If any property in the given map modifies classpath and is different from the currently configured value,
|
||||||
// the given properties will require a different classpath.
|
// the given properties will require a different classpath.
|
||||||
for (final Map.Entry<PropertyDescriptor, String> entry : properties.entrySet()) {
|
for (final Map.Entry<PropertyDescriptor, String> entry : properties.entrySet()) {
|
||||||
final PropertyDescriptor descriptor = entry.getKey();
|
final PropertyDescriptor descriptor = entry.getKey();
|
||||||
@ -308,7 +308,7 @@ public abstract class AbstractComponentNode implements ComponentNode {
|
|||||||
if (propertyName != null && entry.getValue() == null) {
|
if (propertyName != null && entry.getValue() == null) {
|
||||||
removeProperty(propertyName, allowRemovalOfRequiredProperties);
|
removeProperty(propertyName, allowRemovalOfRequiredProperties);
|
||||||
} else if (propertyName != null) {
|
} else if (propertyName != null) {
|
||||||
// Use the EL-Agnostic Parameter Parser to gather the list of referenced Parameters. We do this because we want to to keep track of which parameters
|
// Use the EL-Agnostic Parameter Parser to gather the list of referenced Parameters. We do this because we want to keep track of which parameters
|
||||||
// are referenced, regardless of whether or not they are referenced from within an EL Expression. However, we also will need to derive a different ParameterTokenList
|
// are referenced, regardless of whether or not they are referenced from within an EL Expression. However, we also will need to derive a different ParameterTokenList
|
||||||
// that we can provide to the PropertyConfiguration, so that when compiling the Expression Language Expressions, we are able to keep the Parameter Reference within
|
// that we can provide to the PropertyConfiguration, so that when compiling the Expression Language Expressions, we are able to keep the Parameter Reference within
|
||||||
// the Expression's text.
|
// the Expression's text.
|
||||||
|
@ -71,7 +71,7 @@
|
|||||||
'TLS - Client Auth' - Client authentication policy when connecting to LDAP using LDAPS or START_TLS.
|
'TLS - Client Auth' - Client authentication policy when connecting to LDAP using LDAPS or START_TLS.
|
||||||
Possible values are REQUIRED, WANT, NONE.
|
Possible values are REQUIRED, WANT, NONE.
|
||||||
'TLS - Protocol' - Protocol to use when connecting to LDAP using LDAPS or START_TLS. (i.e. TLS,
|
'TLS - Protocol' - Protocol to use when connecting to LDAP using LDAPS or START_TLS. (i.e. TLS,
|
||||||
TLSv1.1, TLSv1.2, etc).
|
TLSv1.1, TLSv1.2, etc.).
|
||||||
'TLS - Shutdown Gracefully' - Specifies whether the TLS should be shut down gracefully
|
'TLS - Shutdown Gracefully' - Specifies whether the TLS should be shut down gracefully
|
||||||
before the target context is closed. Defaults to false.
|
before the target context is closed. Defaults to false.
|
||||||
|
|
||||||
@ -255,7 +255,7 @@
|
|||||||
The FileAccessPolicyProvider will provide support for managing access policies which is backed by a file
|
The FileAccessPolicyProvider will provide support for managing access policies which is backed by a file
|
||||||
on the local file system.
|
on the local file system.
|
||||||
|
|
||||||
- User Group Provider - The identifier for an User Group Provider defined above that will be used to access
|
- User Group Provider - The identifier for a User Group Provider defined above that will be used to access
|
||||||
users and groups for use in the managed access policies.
|
users and groups for use in the managed access policies.
|
||||||
|
|
||||||
- Authorizations File - The file where the FileAccessPolicyProvider will store policies.
|
- Authorizations File - The file where the FileAccessPolicyProvider will store policies.
|
||||||
|
@ -355,7 +355,7 @@ public abstract class ApplicationResource {
|
|||||||
* When a two-phase commit style request is used, the first phase (generally referred to
|
* When a two-phase commit style request is used, the first phase (generally referred to
|
||||||
* as the "commit-request stage") is intended to validate that the request can be completed.
|
* as the "commit-request stage") is intended to validate that the request can be completed.
|
||||||
* In NiFi, we use this phase to validate that the request can complete. This method determines
|
* In NiFi, we use this phase to validate that the request can complete. This method determines
|
||||||
* whether or not the request is the first phase of a two-phase commit.
|
* whether the request is the first phase of a two-phase commit.
|
||||||
*
|
*
|
||||||
* @param httpServletRequest the request
|
* @param httpServletRequest the request
|
||||||
* @return <code>true</code> if the request represents a two-phase commit style request and is the
|
* @return <code>true</code> if the request represents a two-phase commit style request and is the
|
||||||
@ -374,7 +374,7 @@ public abstract class ApplicationResource {
|
|||||||
}
|
}
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Checks whether or not the request should be replicated to the cluster
|
* Checks whether the request should be replicated to the cluster
|
||||||
*
|
*
|
||||||
* @return <code>true</code> if the request should be replicated, <code>false</code> otherwise
|
* @return <code>true</code> if the request should be replicated, <code>false</code> otherwise
|
||||||
*/
|
*/
|
||||||
@ -847,7 +847,7 @@ public abstract class ApplicationResource {
|
|||||||
* @throws UnknownNodeException if the nodeUuid given does not map to any node in the cluster
|
* @throws UnknownNodeException if the nodeUuid given does not map to any node in the cluster
|
||||||
*/
|
*/
|
||||||
protected Response replicate(final URI path, final String method, final Object entity, final String nodeUuid, final Map<String, String> headersToOverride) {
|
protected Response replicate(final URI path, final String method, final Object entity, final String nodeUuid, final Map<String, String> headersToOverride) {
|
||||||
// since we're cluster we must specify the cluster node identifier
|
// since we're in a cluster we must specify the cluster node identifier
|
||||||
if (nodeUuid == null) {
|
if (nodeUuid == null) {
|
||||||
throw new IllegalArgumentException("The cluster node identifier must be specified.");
|
throw new IllegalArgumentException("The cluster node identifier must be specified.");
|
||||||
}
|
}
|
||||||
|
@ -150,7 +150,7 @@
|
|||||||
</build>
|
</build>
|
||||||
|
|
||||||
<profiles>
|
<profiles>
|
||||||
<!-- Specifies an empty buildRevision and buildBranch when building outside of a git repo -->
|
<!-- Specifies an empty buildRevision and buildBranch when building outside a git repo -->
|
||||||
<profile>
|
<profile>
|
||||||
<id>build-info-no-git</id>
|
<id>build-info-no-git</id>
|
||||||
<activation>
|
<activation>
|
||||||
|
@ -224,7 +224,7 @@
|
|||||||
The FileAccessPolicyProvider will provide support for managing access policies which is backed by a file
|
The FileAccessPolicyProvider will provide support for managing access policies which is backed by a file
|
||||||
on the local file system.
|
on the local file system.
|
||||||
|
|
||||||
- User Group Provider - The identifier for an User Group Provider defined above that will be used to access
|
- User Group Provider - The identifier for a User Group Provider defined above that will be used to access
|
||||||
users and groups for use in the managed access policies.
|
users and groups for use in the managed access policies.
|
||||||
|
|
||||||
- Authorizations File - The file where the FileAccessPolicyProvider will store policies.
|
- Authorizations File - The file where the FileAccessPolicyProvider will store policies.
|
||||||
|
@ -23,7 +23,7 @@
|
|||||||
<artifactId>nifi-registry-revision-common</artifactId>
|
<artifactId>nifi-registry-revision-common</artifactId>
|
||||||
<packaging>jar</packaging>
|
<packaging>jar</packaging>
|
||||||
|
|
||||||
<!-- NOTE: This module should be mindful of it's dependencies and should generally only depend on the revision API -->
|
<!-- NOTE: This module should be mindful of its dependencies and should generally only depend on the revision API -->
|
||||||
<dependencies>
|
<dependencies>
|
||||||
<dependency>
|
<dependency>
|
||||||
<groupId>org.slf4j</groupId>
|
<groupId>org.slf4j</groupId>
|
||||||
|
@ -95,7 +95,7 @@ public abstract class AbstractCommand<R extends Result> implements Command<R> {
|
|||||||
}
|
}
|
||||||
|
|
||||||
protected void doInitialize(final Context context) {
|
protected void doInitialize(final Context context) {
|
||||||
// sub-classes can override to do additional things like add options
|
// subclasses can override to do additional things like add options
|
||||||
}
|
}
|
||||||
|
|
||||||
protected void addOption(final Option option) {
|
protected void addOption(final Option option) {
|
||||||
|
@ -51,7 +51,7 @@ public abstract class AbstractCommandGroup implements CommandGroup {
|
|||||||
}
|
}
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Sub-classes override to provide the appropriate commands for the given group.
|
* Subclasses override to provide the appropriate commands for the given group.
|
||||||
*
|
*
|
||||||
* @return the list of commands for this group
|
* @return the list of commands for this group
|
||||||
*/
|
*/
|
||||||
|
Loading…
x
Reference in New Issue
Block a user