NIFI-12457: Added additional use case based documentation and fixed some typos/made some clarifications in related docs

Signed-off-by: Pierre Villard <pierre.villard.fr@gmail.com>

This closes #8108.
This commit is contained in:
Mark Payne 2023-12-03 16:07:56 -05:00 committed by Pierre Villard
parent 4f26b93615
commit 283abadf97
No known key found for this signature in database
GPG Key ID: F92A93B30C07C6D5
4 changed files with 111 additions and 17 deletions

View File

@ -129,6 +129,63 @@ import java.util.concurrent.TimeUnit;
) )
} }
) )
@MultiProcessorUseCase(
description = "Retrieve only files from S3 that meet some specified criteria",
keywords = {"s3", "state", "retrieve", "filter", "select", "fetch", "criteria"},
configurations = {
@ProcessorConfiguration(
processorClass = ListS3.class,
configuration = """
The "Bucket" property should be set to the name of the S3 bucket that files reside in. If the flow being built is to be reused elsewhere, it's a good idea to parameterize \
this property by setting it to something like `#{S3_SOURCE_BUCKET}`.
The "Region" property must be set to denote the S3 region that the Bucket resides in. If the flow being built is to be reused elsewhere, it's a good idea to parameterize \
this property by setting it to something like `#{S3_SOURCE_REGION}`.
The "AWS Credentials Provider service" property should specify an instance of the AWSCredentialsProviderControllerService in order to provide credentials for accessing the bucket.
The 'success' Relationship of this Processor is then connected to RouteOnAttribute.
"""
),
@ProcessorConfiguration(
processorClassName = "org.apache.nifi.processors.standard.RouteOnAttribute",
configuration = """
If you would like to "OR" together all of the conditions (i.e., the file should be retrieved if any of the conditions are met), \
set "Routing Strategy" to "Route to 'matched' if any matches".
If you would like to "AND" together all of the conditions (i.e., the file should only be retrieved if all of the conditions are met), \
set "Routing Strategy" to "Route to 'matched' if all match".
For each condition that you would like to filter on, add a new property. The name of the property should describe the condition. \
The value of the property should be an Expression Language expression that returns `true` if the file meets the condition or `false` \
if the file does not meet the condition.
Some attributes that you may consider filtering on are:
- `filename` (the name of the file)
- `s3.length` (the number of bytes in the file)
- `s3.tag.<tag name>` (the value of the s3 tag with the name `tag name`)
- `s3.user.metadata.<key name>` (the value of the user metadata with the key named `key name`)
For example, to fetch only files that are at least 1 MB and have a filename ending in `.zip` we would set the following properties:
- "Routing Strategy" = "Route to 'matched' if all match"
- "At least 1 MB" = "${s3.length:ge(1000000)}"
- "Ends in .zip" = "${filename:endsWith('.zip')}"
Auto-terminate the `unmatched` Relationship.
Connect the `matched` Relationship to the FetchS3Object processor.
"""
),
@ProcessorConfiguration(
processorClass = FetchS3Object.class,
configuration = """
"Bucket" = "${s3.bucket}"
"Object Key" = "${filename}"
The "AWS Credentials Provider service" property should specify an instance of the AWSCredentialsProviderControllerService in order to provide credentials for accessing the bucket.
The "Region" property must be set to the same value as the "Region" property of the ListS3 Processor.
"""
)
}
)
@MultiProcessorUseCase( @MultiProcessorUseCase(
description = "Retrieve new files as they arrive in an S3 bucket", description = "Retrieve new files as they arrive in an S3 bucket",
notes = "This method of retrieving files from S3 is more efficient than using ListS3 and more cost effective. It is the pattern recommended by AWS. " + notes = "This method of retrieving files from S3 is more efficient than using ListS3 and more cost effective. It is the pattern recommended by AWS. " +

View File

@ -117,7 +117,7 @@ import java.util.stream.Stream;
Choose a RecordWriter that writes the data in the desired output format. Choose a RecordWriter that writes the data in the desired output format.
Add a single additional property. The name of the property should describe the criteria. \ Add a single additional property. The name of the property should describe the criteria. \
The property's value should be a RecordPath that returns `true` if the Record meets the criteria or `false otherwise. The property's value should be a RecordPath that returns `true` if the Record meets the criteria or `false` otherwise.
For example, if we want to separate records based on whether or not they have a transaction total of more than $1,000 we could add a new property named \ For example, if we want to separate records based on whether or not they have a transaction total of more than $1,000 we could add a new property named \
`largeTransaction` with a value of `/transaction/total > 1000`. This will create two FlowFiles. In the first, all records will have a total over `1000`. \ `largeTransaction` with a value of `/transaction/total > 1000`. This will create two FlowFiles. In the first, all records will have a total over `1000`. \

View File

@ -26,6 +26,8 @@ import org.apache.nifi.annotation.behavior.SupportsBatching;
import org.apache.nifi.annotation.behavior.WritesAttribute; import org.apache.nifi.annotation.behavior.WritesAttribute;
import org.apache.nifi.annotation.behavior.WritesAttributes; import org.apache.nifi.annotation.behavior.WritesAttributes;
import org.apache.nifi.annotation.documentation.CapabilityDescription; import org.apache.nifi.annotation.documentation.CapabilityDescription;
import org.apache.nifi.annotation.documentation.MultiProcessorUseCase;
import org.apache.nifi.annotation.documentation.ProcessorConfiguration;
import org.apache.nifi.annotation.documentation.Tags; import org.apache.nifi.annotation.documentation.Tags;
import org.apache.nifi.annotation.documentation.UseCase; import org.apache.nifi.annotation.documentation.UseCase;
import org.apache.nifi.annotation.lifecycle.OnScheduled; import org.apache.nifi.annotation.lifecycle.OnScheduled;
@ -67,20 +69,21 @@ import java.util.concurrent.atomic.AtomicReference;
@InputRequirement(Requirement.INPUT_REQUIRED) @InputRequirement(Requirement.INPUT_REQUIRED)
@Tags({"attributes", "routing", "Attribute Expression Language", "regexp", "regex", "Regular Expression", "Expression Language", "find", "text", "string", "search", "filter", "detect"}) @Tags({"attributes", "routing", "Attribute Expression Language", "regexp", "regex", "Regular Expression", "Expression Language", "find", "text", "string", "search", "filter", "detect"})
@CapabilityDescription("Routes FlowFiles based on their Attributes using the Attribute Expression Language") @CapabilityDescription("Routes FlowFiles based on their Attributes using the Attribute Expression Language")
@DynamicProperty(name = "Relationship Name", value = "Attribute Expression Language", @DynamicProperty(name = "Relationship Name", value = "Expression Language expression that returns a boolean value indicating whether or not the FlowFile should be routed to this Relationship",
expressionLanguageScope = ExpressionLanguageScope.FLOWFILE_ATTRIBUTES, description = "Routes FlowFiles whose attributes match the " expressionLanguageScope = ExpressionLanguageScope.FLOWFILE_ATTRIBUTES,
+ "Attribute Expression Language specified in the Dynamic Property Value to the Relationship specified in the Dynamic Property Key") description = "Routes FlowFiles whose attributes match the Expression Language specified in the Dynamic Property Value to the Relationship " +
"specified in the Dynamic Property Key")
@DynamicRelationship(name = "Name from Dynamic Property", description = "FlowFiles that match the Dynamic Property's Attribute Expression Language") @DynamicRelationship(name = "Name from Dynamic Property", description = "FlowFiles that match the Dynamic Property's Attribute Expression Language")
@WritesAttributes({ @WritesAttributes({
@WritesAttribute(attribute = RouteOnAttribute.ROUTE_ATTRIBUTE_KEY, description = "The relation to which the FlowFile was routed") @WritesAttribute(attribute = RouteOnAttribute.ROUTE_ATTRIBUTE_KEY, description = "The relation to which the FlowFile was routed")
}) })
@UseCase( @UseCase(
description = "Route data to one or more relationships based on its attributes.", description = "Route data to one or more relationships based on its attributes using the NiFi Expression Language.",
keywords = {"attributes", "routing", "expression language"}, keywords = {"attributes", "routing", "expression language"},
configuration = """ configuration = """
Set the "Routing Strategy" property to "Route to Property name". Set the "Routing Strategy" property to "Route to Property name".
For each route that a FlowFile might be routed to, add a new property. The name of the property should describe the route. For each route that a FlowFile might be routed to, add a new property. The name of the property should describe the route.
The value of the property is an Attribute Expression Language expression that will be used to determine whether or not a given FlowFile will be routed to the \ The value of the property is an Attribute Expression Language expression that returns a boolean value indicating whether or not a given FlowFile will be routed to the \
associated relationship. associated relationship.
For example, we might route data based on its file extension using the following properties: For example, we might route data based on its file extension using the following properties:
@ -133,6 +136,39 @@ import java.util.concurrent.atomic.AtomicReference;
Connect the 'unmatched' relationship to the next processor in the flow. Connect the 'unmatched' relationship to the next processor in the flow.
""" """
) )
@MultiProcessorUseCase(
description = "Route record-oriented data based on whether or not the record's values meet some criteria",
keywords = {"record", "route", "content", "data"},
configurations = {
@ProcessorConfiguration(
processorClass = PartitionRecord.class,
configuration = """
Choose a RecordReader that is appropriate based on the format of the incoming data.
Choose a RecordWriter that writes the data in the desired output format.
Add a single additional property. The name of the property should describe the criteria to route on. \
The property's value should be a RecordPath that returns `true` if the Record meets the criteria or `false` otherwise. \
This adds a new attribute to the FlowFile whose name is equal to the property name.
Connect the 'success' Relationship to RouteOnAttribute.
"""
),
@ProcessorConfiguration(
processorClass = RouteOnAttribute.class,
configuration = """
Set "Routing Strategy" to "Route to Property name"
Add two additional properties. For the first one, the name of the property should describe data that matches the criteria. \
The value is an Expression Language expression that checks if the attribute added by the PartitionRecord processor has a value of `true`. \
For example, `${criteria:equals('true')}`.
The second property should have a name that describes data that does not match the criteria. The value is an Expression Language that evaluates to the \
opposite of the first property value. For example, `${criteria:equals('true'):not()}`.
Connect each of the newly created Relationships to the appropriate downstream processors.
"""
)
}
)
public class RouteOnAttribute extends AbstractProcessor { public class RouteOnAttribute extends AbstractProcessor {
public static final String ROUTE_ATTRIBUTE_KEY = "RouteOnAttribute.Route"; public static final String ROUTE_ATTRIBUTE_KEY = "RouteOnAttribute.Route";

View File

@ -60,7 +60,6 @@ import java.util.Map;
import java.util.stream.Collectors; import java.util.stream.Collectors;
import java.util.stream.Stream; import java.util.stream.Stream;
@SideEffectFree @SideEffectFree
@SupportsBatching @SupportsBatching
@InputRequirement(Requirement.INPUT_REQUIRED) @InputRequirement(Requirement.INPUT_REQUIRED)
@ -124,8 +123,9 @@ import java.util.stream.Stream;
""" """
) )
@UseCase( @UseCase(
description = "Change the format of a record field.", description = "Change the format of a record field's value.",
keywords = {"change", "update", "replace", "insert", "transform", "format", "date/time", "timezone", "expression language"}, keywords = {"change", "update", "replace", "insert", "transform", "format", "date/time", "timezone", "expression language"},
notes = "Use the RenameRecordField Processor in order to change a field's name.",
configuration = """ configuration = """
"Replacement Value Strategy" = "Literal Value" "Replacement Value Strategy" = "Literal Value"
@ -358,3 +358,4 @@ public class UpdateRecord extends AbstractRecordProcessor {
} }
} }
} }