Rets-Io/docs/asciidoc/transaction-appendix.adoc

346 lines
12 KiB
Plaintext

:batch-asciidoc: ./
:toc: left
:toclevels: 4
[[transactions]]
[appendix]
== Batch Processing and Transactions
[[transactionsNoRetry]]
=== Simple Batching with No Retry
Consider the following simple example of a nested batch with no retries. It shows a
common scenario for batch processing: An input source is processed until exhausted, and
we commit periodically at the end of a "chunk" of processing.
----
1 | REPEAT(until=exhausted) {
|
2 | TX {
3 | REPEAT(size=5) {
3.1 | input;
3.2 | output;
| }
| }
|
| }
----
The input operation (3.1) could be a message-based receive (such as from JMS), or a
file-based read, but to recover and continue processing with a chance of completing the
whole job, it must be transactional. The same applies to the operation at 3.2. It must
be either transactional or idempotent.
If the chunk at `REPEAT` (3) fails because of a database exception at 3.2, then `TX` (2)
must roll back the whole chunk.
[[transactionStatelessRetry]]
=== Simple Stateless Retry
It is also useful to use a retry for an operation which is not transactional, such as a
call to a web-service or other remote resource, as shown in the following example:
----
0 | TX {
1 | input;
1.1 | output;
2 | RETRY {
2.1 | remote access;
| }
| }
----
This is actually one of the most useful applications of a retry, since a remote call is
much more likely to fail and be retryable than a database update. As long as the remote
access (2.1) eventually succeeds, the transaction, `TX` (0), commits. If the remote
access (2.1) eventually fails, then the transaction, `TX` (0), is guaranteed to roll
back.
[[repeatRetry]]
=== Typical Repeat-Retry Pattern
The most typical batch processing pattern is to add a retry to the inner block of the
chunk, as shown in the following example:
----
1 | REPEAT(until=exhausted, exception=not critical) {
|
2 | TX {
3 | REPEAT(size=5) {
|
4 | RETRY(stateful, exception=deadlock loser) {
4.1 | input;
5 | } PROCESS {
5.1 | output;
6 | } SKIP and RECOVER {
| notify;
| }
|
| }
| }
|
| }
----
The inner `RETRY` (4) block is marked as "stateful". See <<transactionsNoRetry,the
typical use case>> for a description of a stateful retry. This means that if the
retry `PROCESS` (5) block fails, the behavior of the `RETRY` (4) is as follows:
. Throw an exception, rolling back the transaction, `TX` (2), at the chunk level, and
allowing the item to be re-presented to the input queue.
. When the item re-appears, it might be retried depending on the retry policy in place,
executing `PROCESS` (5) again. The second and subsequent attempts might fail again and
re-throw the exception.
. Eventually, the item reappears for the final time. The retry policy disallows another
attempt, so `PROCESS` (5) is never executed. In this case, we follow the `RECOVER` (6)
path, effectively "skipping" the item that was received and is being processed.
Note that the notation used for the `RETRY` (4) in the plan above explicitly shows that
the input step (4.1) is part of the retry. It also makes clear that there are two
alternate paths for processing: the normal case, as denoted by `PROCESS` (5), and the
recovery path, as denoted in a separate block by `RECOVER` (6). The two alternate paths
are completely distinct. Only one is ever taken in normal circumstances.
In special cases (such as a special `TranscationValidException` type), the retry policy
might be able to determine that the `RECOVER` (6) path can be taken on the last attempt
after `PROCESS` (5) has just failed, instead of waiting for the item to be re-presented.
This is not the default behavior, because it requires detailed knowledge of what has
happened inside the `PROCESS` (5) block, which is not usually available. For example, if
the output included write access before the failure, then the exception should be
re-thrown to ensure transactional integrity.
The completion policy in the outer `REPEAT` (1) is crucial to the success of the above
plan. If the output (5.1) fails, it may throw an exception (it usually does, as
described), in which case the transaction, `TX` (2), fails, and the exception could
propagate up through the outer batch `REPEAT` (1). We do not want the whole batch to
stop, because the `RETRY` (4) might still be successful if we try again, so we add
`exception=not critical` to the outer `REPEAT` (1).
Note, however, that if the `TX` (2) fails and we __do__ try again, by virtue of the outer
completion policy, the item that is next processed in the inner `REPEAT` (3) is not
guaranteed to be the one that just failed. It might be, but it depends on the
implementation of the input (4.1). Thus, the output (5.1) might fail again on either a
new item or the old one. The client of the batch should not assume that each `RETRY` (4)
attempt is going to process the same items as the last one that failed. For example, if
the termination policy for `REPEAT` (1) is to fail after 10 attempts, it fails after 10
consecutive attempts but not necessarily at the same item. This is consistent with the
overall retry strategy. The inner `RETRY` (4) is aware of the history of each item and
can decide whether or not to have another attempt at it.
[[asyncChunkProcessing]]
=== Asynchronous Chunk Processing
The inner batches or chunks in the <<repeatRetry,typical example>> can be executed
concurrently by configuring the outer batch to use an `AsyncTaskExecutor`. The outer
batch waits for all the chunks to complete before completing. The following example shows
asynchronous chunk processing:
----
1 | REPEAT(until=exhausted, concurrent, exception=not critical) {
|
2 | TX {
3 | REPEAT(size=5) {
|
4 | RETRY(stateful, exception=deadlock loser) {
4.1 | input;
5 | } PROCESS {
| output;
6 | } RECOVER {
| recover;
| }
|
| }
| }
|
| }
----
[[asyncItemProcessing]]
=== Asynchronous Item Processing
The individual items in chunks in the <<repeatRetry,typical example>> can also, in
principle, be processed concurrently. In this case, the transaction boundary has to move
to the level of the individual item, so that each transaction is on a single thread, as
shown in the following example:
----
1 | REPEAT(until=exhausted, exception=not critical) {
|
2 | REPEAT(size=5, concurrent) {
|
3 | TX {
4 | RETRY(stateful, exception=deadlock loser) {
4.1 | input;
5 | } PROCESS {
| output;
6 | } RECOVER {
| recover;
| }
| }
|
| }
|
| }
----
This plan sacrifices the optimization benefit, which the simple plan had, of having all
the transactional resources chunked together. It is only useful if the cost of the
processing (5) is much higher than the cost of transaction management (3).
[[transactionPropagation]]
=== Interactions Between Batching and Transaction Propagation
There is a tighter coupling between batch-retry and transaction management than we would
ideally like. In particular, a stateless retry cannot be used to retry database
operations with a transaction manager that does not support NESTED propagation.
The following example uses retry without repeat:
----
1 | TX {
|
1.1 | input;
2.2 | database access;
2 | RETRY {
3 | TX {
3.1 | database access;
| }
| }
|
| }
----
Again, and for the same reason, the inner transaction, `TX` (3), can cause the outer
transaction, `TX` (1), to fail, even if the `RETRY` (2) is eventually successful.
Unfortunately, the same effect percolates from the retry block up to the surrounding
repeat batch if there is one, as shown in the following example:
----
1 | TX {
|
2 | REPEAT(size=5) {
2.1 | input;
2.2 | database access;
3 | RETRY {
4 | TX {
4.1 | database access;
| }
| }
| }
|
| }
----
Now, if TX (3) rolls back, it can pollute the whole batch at TX (1) and force it to roll
back at the end.
What about non-default propagation?
* In the preceding example, `PROPAGATION_REQUIRES_NEW` at `TX` (3) prevents the outer
`TX` (1) from being polluted if both transactions are eventually successful. But if `TX`
(3) commits and `TX` (1) rolls back, then `TX` (3) stays committed, so we violate the
transaction contract for `TX` (1). If `TX` (3) rolls back, `TX` (1) does not necessarily
(but it probably does in practice, because the retry throws a roll back exception).
* `PROPAGATION_NESTED` at `TX` (3) works as we require in the retry case (and for a
batch with skips): `TX` (3) can commit but subsequently be rolled back by the outer
transaction, `TX` (1). If `TX` (3) rolls back, `TX` (1) rolls back in practice. This
option is only available on some platforms, not including Hibernate or
JTA, but it is the only one that consistently works.
Consequently, the `NESTED` pattern is best if the retry block contains any database
access.
[[specialTransactionOrthonogonal]]
=== Special Case: Transactions with Orthogonal Resources
Default propagation is always OK for simple cases where there are no nested database
transactions. Consider the following example, where the `SESSION` and `TX` are not
global `XA` resources, so their resources are orthogonal:
----
0 | SESSION {
1 | input;
2 | RETRY {
3 | TX {
3.1 | database access;
| }
| }
| }
----
Here there is a transactional message `SESSION` (0), but it does not participate in other
transactions with `PlatformTransactionManager`, so it does not propagate when `TX` (3)
starts. There is no database access outside the `RETRY` (2) block. If `TX` (3) fails and
then eventually succeeds on a retry, `SESSION` (0) can commit (independently of a `TX`
block). This is similar to the vanilla "best-efforts-one-phase-commit" scenario. The
worst that can happen is a duplicate message when the `RETRY` (2) succeeds and the
`SESSION` (0) cannot commit (for example, because the message system is unavailable).
[[statelessRetryCannotRecover]]
=== Stateless Retry Cannot Recover
The distinction between a stateless and a stateful retry in the typical example above is
important. It is actually ultimately a transactional constraint that forces the
distinction, and this constraint also makes it obvious why the distinction exists.
We start with the observation that there is no way to skip an item that failed and
successfully commit the rest of the chunk unless we wrap the item processing in a
transaction. Consequently, we simplify the typical batch execution plan to be as
follows:
----
0 | REPEAT(until=exhausted) {
|
1 | TX {
2 | REPEAT(size=5) {
|
3 | RETRY(stateless) {
4 | TX {
4.1 | input;
4.2 | database access;
| }
5 | } RECOVER {
5.1 | skip;
| }
|
| }
| }
|
| }
----
The preceding example shows a stateless `RETRY` (3) with a `RECOVER` (5) path that kicks
in after the final attempt fails. The `stateless` label means that the block is repeated
without re-throwing any exception up to some limit. This only works if the transaction,
`TX` (4), has propagation NESTED.
If the inner `TX` (4) has default propagation properties and rolls back, it pollutes the
outer `TX` (1). The inner transaction is assumed by the transaction manager to have
corrupted the transactional resource, so it cannot be used again.
Support for NESTED propagation is sufficiently rare that we choose not to support
recovery with stateless retries in the current versions of Spring Batch. The same effect
can always be achieved (at the expense of repeating more processing) by using the
typical pattern above.