NIFI-6208: Fixes typos in Developers Guide

Signed-off-by: Pierre Villard <pierre.villard.fr@gmail.com>

This closes #3431.
This commit is contained in:
Lars Francke 2019-04-12 16:08:15 +02:00 committed by Pierre Villard
parent b5ff622118
commit bfcc0ebd03
No known key found for this signature in database
GPG Key ID: BEE1599F0726E9CD
1 changed files with 93 additions and 93 deletions

View File

@ -209,7 +209,7 @@ Expression Language.
[[relationship]]
==== Relationship
Relationships define the routes to which a FlowFile may be transfered
Relationships define the routes to which a FlowFile may be transferred
from a Processor. Relationships
are created by instantiating an instance of the `Relationship.Builder`
class, calling the appropriate methods
@ -433,7 +433,7 @@ be taken to ensure that the Processor is thread-safe!
=== Component Lifecycle
The NiFi API provides lifecycle support through use of Java
Annotations. The `org.apache.nifi.annotations.lifecycle` package
Annotations. The `org.apache.nifi.annotation.lifecycle` package
contains
several annotations for lifecycle management. The following
Annotations may be applied to Java methods in a NiFi component to
@ -567,7 +567,7 @@ relied upon for critical business logic.
=== Component Notification
The NiFi API provides notification support through use of Java
Annotations. The `org.apache.nifi.annotations.notification` package
Annotations. The `org.apache.nifi.annotation.notification` package
contains several annotations for notification management. The following
annotations may be applied to Java methods in a NiFi component to
indicate to the framework when the methods should be called. For the
@ -644,7 +644,7 @@ State is stored using the StateManager's `getState`, `setState`, `replace`, and
require that a Scope be provided. It should be noted that the state that is stored with the Local scope is entirely different
than state stored with a Cluster scope. If a Processor stores a value with the key of _My Key_ using the `Scope.CLUSTER` scope,
and then attempts to retrieve the value using the `Scope.LOCAL` scope, the value retrieved will be `null` (unless a value was
also stored with the same key using the `Scope.CLUSTER` scope). Each Processor's state, is stored in isolation from other
also stored with the same key using the `Scope.CLUSTER` scope). Each Processor's state is stored in isolation from other
Processors' state.
It follows, then, that two Processors cannot share the same state. There are, however, some circumstances in which it is very
@ -749,11 +749,11 @@ Each of those values can also be given a description:
[source,java]
----
public static final AllowableValue EXTENSIVE = new AllowableValue("Extensive", "Extensive",
"Everything will be logged - use with caution!");
"Everything will be logged - use with caution!");
public static final AllowableValue VERBOSE = new AllowableValue("Verbose", "Verbose",
"Quite a bit of logging will occur");
"Quite a bit of logging will occur");
public static final AllowableValue REGULAR = new AllowableValue("Regular", "Regular",
"Typical logging will occur");
"Typical logging will occur");
public static final PropertyDescriptor LOG_LEVEL = new PropertyDescriptor.Builder()
.name("Amount to Log")
@ -782,7 +782,7 @@ public static final Relationship MY_RELATIONSHIP = new Relationship.Builder()
=== Documenting Capability and Keywords
The `org.apache.nifi.annotations.documentation` package provides Java
The `org.apache.nifi.annotation.documentation` package provides Java
annotations that can be used to document components. The
CapabilityDescription
annotation can be added to a Processor, Reporting Task, or Controller
@ -805,7 +805,7 @@ example of using these annotations is provided below:
----
@Tags({"example", "documentation", "developer guide", "processor", "tags"})
@CapabilityDescription("Example Processor that provides no real functionality but is provided" +
" for an example in the Developer Guide")
" for an example in the Developer Guide")
public static final ExampleProcessor extends Processor {
...
}
@ -1111,7 +1111,7 @@ The Processor then emits a Provenance ROUTE event indicating which
Relationship to which the Processor routed the FlowFile.
This Processor is annotated with the `@SideEffectFree` and
`@SupportsBatching` annotations from the `org.apache.nifi.annotations.behavior`
`@SupportsBatching` annotations from the `org.apache.nifi.annotation.behavior`
package.
@ -1188,7 +1188,7 @@ emitted for each FlowFile. This is logged, and the method returns.
This Processor is annotated with the `@SideEffectFree` and
`@SupportsBatching` annotations from the
`org.apache.nifi.annotations.behavior`
`org.apache.nifi.annotation.behavior`
package.
@ -1366,7 +1366,7 @@ event
is emitted. Typically, ROUTE events are not emitted when routing a
FlowFile to `success` because this generates a very verbose lineage
that
becomes difficult to navigate. However, in this case,the event is
becomes difficult to navigate. However, in this case, the event is
useful because we would otherwise expect a FORK event and the absence
of
any event is likely to cause confusion. The fact that the FlowFile was
@ -1468,7 +1468,7 @@ are updated to include the following attributes:
| Attribute Name | Description
| `split.parent.uuid` | The UUID of the original FlowFile
| `split.index` | A one-up number indicating which FlowFile in the list this is (the first FlowFile
created will have a value `0`, the second will have a value `1`, etc.)
created will have a value `0`, the second will have a value `1`, etc.)
| `split.count` | The total number of split FlowFiles that were created
|===
@ -1551,10 +1551,10 @@ handle unexpected errors during the course of their work.
During the execution of the `onTrigger` method of a Processor, many things can potentially go
awry. Common failure conditions include:
- Incoming data is not in the expected format.
- Network connections to external services fail.
- Reading or writing data to a disk fails.
- There is a bug in the Processor or a dependent library.
- Incoming data is not in the expected format.
- Network connections to external services fail.
- Reading or writing data to a disk fails.
- There is a bug in the Processor or a dependent library.
Any of these conditions can result in an Exception being thrown from the Processor. From the framework
perspective, there are two types of Exceptions that can escape a Processor: `ProcessException` and
@ -1639,8 +1639,8 @@ by calling the `rollback()` or `rollback(boolean)` method. If using the latter,
FlowFiles that have been pulled from queues (via the ProcessSession `get` methods) should be penalized before being added
back to their queues.
When `rollback` is called, any modification that has occurred to the FlowFiles in that session are discarded, to included
both content modification and attribute modification. Additionally, all Provenance Events are rolled back (with the exception
When `rollback` is called, any modification that has occurred to the FlowFiles in that session are discarded, this includes
both content and attribute modifications. Additionally, all Provenance Events are rolled back (with the exception
of any SEND event that was emitted by passing a value of `true` for the `force` argument). The FlowFiles that were pulled from
the input queues are then transferred back to the input queues (and optionally penalized) so that they can be processed again.
@ -1649,10 +1649,10 @@ any Provenance Events that occurred are persisted in the Provenance Repository.
another FlowFile references the same piece of content), and the FlowFiles are transferred to the outbound queues so that the
next Processors can operate on the data.
It is also important to note how this behavior is affected by using the `org.apache.nifi.annotations.behavior.SupportsBatching`
annotation. If a Processor utilizes this annotation, calls to `ProcessSession.commit` may not take affect immediately. Rather,
It is also important to note how this behavior is affected by using the `org.apache.nifi.annotation.behavior.SupportsBatching`
annotation. If a Processor utilizes this annotation, calls to `ProcessSession.commit` may not take effect immediately. Rather,
these commits may be batched together in order to provide higher throughput. However, if at any point, the Processor rolls back
the ProcessSession, all changes since the last call to `commit` will be discarded and all "batched" commits will take affect.
the ProcessSession, all changes since the last call to `commit` will be discarded and all "batched" commits will take effect.
These "batched" commits are not rolled back.
@ -1671,7 +1671,7 @@ important knowledge about the context that others do not have. Documentation sho
less familiar with the process are able to use it with ease.
When thinking about the user experience, it is also important to note that consistency is very important. It is best
to stick with the standard <<naming-convensions>>. This is true for Processor names, Property names and value, Relationship
to stick with the standard <<naming-conventions>>. This is true for Processor names, Property names and value, Relationship
names, and any other aspect that the user will experience.
Simplicity is crucial! Avoid adding properties that you don't expect users to understand or change. As developers, we are
@ -1688,31 +1688,31 @@ format Y and send the newly-formatted data to some external service.
Taking this approach of formatting the data for a particular endpoint and then sending the data to that endpoint within the same Processor
has several drawbacks:
- The Processor becomes very complex, as it has to perform the data translation task as well as the task of
sending the data to the remote service.
- If the Processor is unable to communicate with the remote service, it will route the data to a `failure` Relationship. In this case,
the Processor will be responsible to perform the data translation again. And if it fails again, the translation is done yet again.
- If we have five different Processors that translate the incoming data into this new format before sending the data, we have a great
deal of duplicated code. If the schema changes, for instance, many Processors must be updated.
- This intermediate data is thrown away when the Processor finishes sending to the remote service. The intermediate data format
may well be useful to other Processors.
- The Processor becomes very complex, as it has to perform the data translation task as well as the task of
sending the data to the remote service.
- If the Processor is unable to communicate with the remote service, it will route the data to a `failure` Relationship. In this case,
the Processor will be responsible to perform the data translation again. And if it fails again, the translation is done yet again.
- If we have five different Processors that translate the incoming data into this new format before sending the data, we have a great
deal of duplicated code. If the schema changes, for instance, many Processors must be updated.
- This intermediate data is thrown away when the Processor finishes sending to the remote service. The intermediate data format
may well be useful to other Processors.
In order to avoid these issues, and make Processors more reusable, a Processor should always stick to the principal of "do one thing and do
it well." Such a Processor should be broken into two separate Processors: one to convert the data from Format X to Format Y, and another
Processor to send data to the remote resource.
[[naming-convensions]]
[[naming-conventions]]
=== Naming Conventions
In order to deliver a consistent look and feel to users, it is advisable that Processors keep with standard naming conventions. The following
is a list of standard conventions that are used:
- Processors that pull data from a remote system are named Get<Service> or Get<Protocol>, depending on if they poll data from arbitrary
sources over a known Protocol (such as GetHTTP or GetFTP) or if they pull data from a known service (such as GetKafka)
- Processors that push data to a remote system are named Put<Service> or Put<Protocol>.
- Relationship names are lower-cased and use spaces to delineated words.
- Property names capitalize significant words, as would be done with the title of a book.
- Processors that pull data from a remote system are named Get<Service> or Get<Protocol>, depending on if they poll data from arbitrary
sources over a known Protocol (such as GetHTTP or GetFTP) or if they pull data from a known service (such as GetKafka)
- Processors that push data to a remote system are named Put<Service> or Put<Protocol>.
- Relationship names are lower-cased and use spaces to delineated words.
- Property names capitalize significant words, as would be done with the title of a book.
@ -1720,36 +1720,36 @@ is a list of standard conventions that are used:
When creating a Processor, the developer is able to provide hints to the framework about how to utilize the Processor most
effectively. This is done by applying annotations to the Processor's class. The annotations that can be applied to a
Processor exist in three sub-packages of `org.apache.nifi.annotations`. Those in the `documentation` sub-package are used
Processor exist in three sub-packages of `org.apache.nifi.annotation`. Those in the `documentation` sub-package are used
to provide documentation to the user. Those in the `lifecycle` sub-package instruct the framework which methods should
be called on the Processor in order to respond to the appropriate life-cycle events. Those in the `behavior` package
help the framework understand how to interact with the Processor in terms of scheduling and general behavior.
The following annotations from the `org.apache.nifi.annotations.behavior` package can be used to modify how the framework
The following annotations from the `org.apache.nifi.annotation.behavior` package can be used to modify how the framework
will handle your Processor:
- `EventDriven`: Instructs the framework that the Processor can be scheduled using the Event-Driven scheduling
strategy. This strategy is still experimental at this point, but can result in reduced resource utilization
on dataflows that do not handle extremely high data rates.
- `EventDriven`: Instructs the framework that the Processor can be scheduled using the Event-Driven scheduling
strategy. This strategy is still experimental at this point, but can result in reduced resource utilization
on dataflows that do not handle extremely high data rates.
- `SideEffectFree`: Indicates that the Processor does not have any side effects external to NiFi. As a result, the
framework is free to invoke the Processor many times with the same input without causing any unexpected
results to occur. This implies idempotent behavior. This can be used by the framework to improve efficiency by
performing actions such as transferring a ProcessSession from one Processor to another, such that if
a problem occurs many Processors' actions can be rolled back and performed again.
- `SideEffectFree`: Indicates that the Processor does not have any side effects external to NiFi. As a result, the
framework is free to invoke the Processor many times with the same input without causing any unexpected
results to occur. This implies idempotent behavior. This can be used by the framework to improve efficiency by
performing actions such as transferring a ProcessSession from one Processor to another, such that if
a problem occurs many Processors' actions can be rolled back and performed again.
- `SupportsBatching`: This annotation indicates that it is okay for the framework to batch together multiple
ProcessSession commits into a single commit. If this annotation is present, the user will be able to choose
whether they prefer high throughput or lower latency in the Processor's Scheduling tab. This annotation should
be applied to most Processors, but it comes with a caveat: if the Processor calls `ProcessSession.commit`,
there is no guarantee that the data has been safely stored in NiFi's Content, FlowFile, and Provenance Repositories.
As a result, it is not appropriate for those Processors that receive data from an external source, commit the session,
and then delete the remote data or confirm a transaction with a remote resource.
- `SupportsBatching`: This annotation indicates that it is okay for the framework to batch together multiple
ProcessSession commits into a single commit. If this annotation is present, the user will be able to choose
whether they prefer high throughput or lower latency in the Processor's Scheduling tab. This annotation should
be applied to most Processors, but it comes with a caveat: if the Processor calls `ProcessSession.commit`,
there is no guarantee that the data has been safely stored in NiFi's Content, FlowFile, and Provenance Repositories.
As a result, it is not appropriate for those Processors that receive data from an external source, commit the session,
and then delete the remote data or confirm a transaction with a remote resource.
- `TriggerSerially`: When this annotation is present, the framework will not allow the user to schedule more than one
concurrent thread to execute the `onTrigger` method at a time. Instead, the number of thread ("Concurrent Tasks")
will always be set to `1`. This does *not*, however, mean that the Processor does not have to be thread-safe,
as the thread that is executing `onTrigger` may change between invocations.
- `TriggerSerially`: When this annotation is present, the framework will not allow the user to schedule more than one
concurrent thread to execute the `onTrigger` method at a time. Instead, the number of thread ("Concurrent Tasks")
will always be set to `1`. This does *not*, however, mean that the Processor does not have to be thread-safe,
as the thread that is executing `onTrigger` may change between invocations.
- `PrimaryNodeOnly`: Apache NiFi, when clustered, offers two modes of execution for Processors: "Primary Node" and
"All Nodes". Although running in all the nodes offers better parallelism, some Processors are known to cause unintended
@ -1757,20 +1757,20 @@ will handle your Processor:
Processors are scheduled to run on "All Nodes", it will cause unnecessary duplication and even errors. Such Processors
should use this annotation. Applying this annotation will restrict the Processor to run only on the "Primary Node".
- `TriggerWhenAnyDestinationAvailable`: By default, NiFi will not schedule a Processor to run if any of its outbound
queues is full. This allows back-pressure to be applied all the way a chain of Processors. However, some Processors
may need to run even if one of the outbound queues is full. This annotations indicates that the Processor should run
if any Relationship is "available." A Relationship is said to be "available" if none of the connections that use
that Relationship is full. For example, the DistributeLoad Processor makes use of this annotation. If the "round robin"
scheduling strategy is used, the Processor will not run if any outbound queue is full. However, if the "next available"
scheduling strategy is used, the Processor will run if any Relationship at all is available and will route FlowFiles
only to those relationships that are available.
- `TriggerWhenAnyDestinationAvailable`: By default, NiFi will not schedule a Processor to run if any of its outbound
queues is full. This allows back-pressure to be applied all the way a chain of Processors. However, some Processors
may need to run even if one of the outbound queues is full. This annotation indicates that the Processor should run
if any Relationship is "available." A Relationship is said to be "available" if none of the connections that use
that Relationship is full. For example, the DistributeLoad Processor makes use of this annotation. If the "round robin"
scheduling strategy is used, the Processor will not run if any outbound queue is full. However, if the "next available"
scheduling strategy is used, the Processor will run if any Relationship at all is available and will route FlowFiles
only to those relationships that are available.
- `TriggerWhenEmpty`: The default behavior is to trigger a Processor to run only if its input queue has at least one
FlowFile or if the Processor has no input queues (which is typical of a "source" Processor). Applying this annotation
will cause the framework to ignore the size of the input queues and trigger the Processor regardless of whether or
not there is any data on an input queue. This is useful, for example, if the Processor needs to be triggered to run
periodically to time out a network connection.
- `TriggerWhenEmpty`: The default behavior is to trigger a Processor to run only if its input queue has at least one
FlowFile or if the Processor has no input queues (which is typical of a "source" Processor). Applying this annotation
will cause the framework to ignore the size of the input queues and trigger the Processor regardless of whether or
not there is any data on an input queue. This is useful, for example, if the Processor needs to be triggered to run
periodically to time out a network connection.
- `InputRequirement`: By default, all Processors will allow users to create incoming connections for the Processor, but
if the user does not create an incoming connection, the Processor is still valid and can be scheduled to run. For Processors
@ -1894,7 +1894,7 @@ In order to make use of this service, the Processor can use code such as:
[source,java]
----
final SSLContextService sslContextService = context.getProperty(SSL_CONTEXT_SERVICE)
.asControllerService(SSLContextService.class);
.asControllerService(SSLContextService.class);
----
Note here that `SSLContextService` is an interface that extends
@ -2138,9 +2138,9 @@ you can use the Maven dependency:
[source]
----
<dependency>
<groupId>org.apache.nifi</groupId>
<artifactId>nifi-mock</artifactId>
<version>${nifi version}</version>
<groupId>org.apache.nifi</groupId>
<artifactId>nifi-mock</artifactId>
<version>${nifi version}</version>
</dependency>
----
@ -2201,7 +2201,7 @@ invoke any method in the Processor with an `@OnScheduled` annotation, call the P
and then run the `@OnUnscheduled` and finally `@OnStopped` methods.
If it is desirable to run several iterations of the `onTrigger` method before the other `@OnUnscheduled` and
`@OnStopped` life-cycle events are triggered, the `run(int)` method can be used to specify now many iterations
`@OnStopped` life-cycle events are triggered, the `run(int)` method can be used to specify how many iterations
of `onTrigger` should be called.
There are times when we want to trigger the Processor to run but not trigger the `@OnUnscheduled` and `@OnStopped`
@ -2256,22 +2256,22 @@ a subclass of the Processor in our unit test and use this:
----
@Test
public void testConnectionFailure() {
final TestRunner runner = TestRunners.newTestRunner(new MyProcessor() {
protected Client getClient() {
// Return a mocked out client here.
return new Client() {
public void connect() throws IOException {
throw new IOException();
}
final TestRunner runner = TestRunners.newTestRunner(new MyProcessor() {
protected Client getClient() {
// Return a mocked out client here.
return new Client() {
public void connect() throws IOException {
throw new IOException();
}
// ...
// other client methods
// ...
};
}
});
// ...
// other client methods
// ...
};
}
});
// rest of unit test.
// rest of unit test.
}
----
@ -2438,10 +2438,10 @@ This is done so that any NAR that has a dependency on
`my-controller-service-api-nar` will also be able to access
all of the Controller Services that are provided by the
`nifi-standard-services-api-nar`, such as the SSLContextService.
In this same vane, it is not necessary to create a different
In this same vein, it is not necessary to create a different
"service-api" NAR for each service. Instead, it often
makes sense to have a single "service-api" NAR that encapsulates the
API's for many different Controller Services,
APIs for many different Controller Services,
as is done by the `nifi-standard-services-api-nar`. Generally, the API
will not include extensive dependencies, and as a result,
ClassLoader isolation may be less important, so lumping together many
@ -2493,7 +2493,7 @@ NiFi provides the `@RequiresInstanceClassLoading` annotation to further expand a
available on a components classpath. You can annotate a component with `@RequiresInstanceClassLoading`
to indicate that the instance ClassLoader for the component requires a copy of all the resources in the
component's NAR ClassLoader. When `@RequiresInstanceClassLoading` is not present, the
instance ClassLoader simply has it's parent ClassLoader set to the NAR ClassLoader, rather than
instance ClassLoader simply has its parent ClassLoader set to the NAR ClassLoader, rather than
copying resources.
The `@RequiresInstanceClassLoading` annotation also provides an optional flag `cloneAncestorResources'. If