OpenSearch/docs/reference/transform/checkpoints.asciidoc

[role="xpack"]
[[transform-checkpoints]]
=== How {transform} checkpoints work
++++
<titleabbrev>How checkpoints work</titleabbrev>
++++

Each time a {transform} examines the source indices and creates or
updates the destination index, it generates a _checkpoint_.

If your {transform} runs only once, there is logically only one
checkpoint. If your {transform} runs continuously, however, it creates
checkpoints as it ingests and transforms new source data.

To create a checkpoint, the {ctransform}:

. Checks for changes to source indices.
+
Using a simple periodic timer, the {transform} checks for changes to
the source indices. This check is done based on the interval defined in the
transform's `frequency` property.
+
If the source indices remain unchanged or if a checkpoint is already in progress
then it waits for the next timer.

. Identifies which entities have changed.
+
The {transform} searches to see which entities have changed since the
last time it checked. The `sync` configuration object in the {transform}
identifies a time field in the source indices. The {transform} uses the values
in that field to synchronize the source and destination indices.
 
. Updates the destination index (the {dataframe}) with the changed entities.
+
--
The {transform} applies changes related to either new or changed
entities to the destination index. The set of changed entities is paginated. For
each page, the {transform} performs a composite aggregation using a
`terms` query. After all the pages of changes have been applied, the checkpoint
is complete.
--

This checkpoint process involves both search and indexing activity on the
cluster. We have attempted to favor control over performance while developing
{transforms}. We decided it was preferable for the
{transform} to take longer to complete, rather than to finish quickly
and take precedence in resource consumption. That being said, the cluster still
requires enough resources to support both the composite aggregation search and
the indexing of its results. 

TIP: If the cluster experiences unsuitable performance degradation due to the
{transform}, stop the {transform}. Consider whether you can apply a
source query to the {transform} to reduce the scope of data it
processes. Also consider whether the cluster has sufficient resources in place
to support both the composite aggregation search and the indexing of its
results.

[discrete]
[[ml-transform-checkpoint-errors]]
==== Error handling

Failures in {transforms} tend to be related to searching or indexing.
To increase the resiliency of {transforms}, the cursor positions of
the aggregated search and the changed entities search are tracked in memory and
persisted periodically.

Checkpoint failures can be categorized as follows:

* Temporary failures: The checkpoint is retried. If 10 consecutive failures
occur, the {transform} has a failed status. For example, this
situation might occur when there are shard failures and queries return only
partial results.
* Irrecoverable failures: The {transform} immediately fails. For
example, this situation occurs when the source index is not found.
* Adjustment failures: The {transform} retries with adjusted settings.
For example, if a parent circuit breaker memory errors occur during the
composite aggregation, the {transform} receives partial results. The aggregated
search is retried with a smaller number of buckets. This retry is performed at
the interval defined in the `frequency` property for the {transform}. If the
search is retried to the point where it reaches a minimal number of buckets, an
irrecoverable failure occurs.

If the node running the {transforms} fails, the {transform} restarts
from the most recent persisted cursor position. This recovery process might
repeat some of the work the {transform} had already done, but it ensures data
consistency.
[DOCS] Adds transform content (#46575) (#46578) 2019-09-11 11:44:03 -04:00			`[role="xpack"]`
[DOCS] Adds transforms to Elasticsearch book (#46846) (#47055) 2019-09-25 11:11:37 -04:00			`[[transform-checkpoints]]`
			`=== How {transform} checkpoints work`
[DOCS] Adds transform content (#46575) (#46578) 2019-09-11 11:44:03 -04:00			`++++`
			`<titleabbrev>How checkpoints work</titleabbrev>`
			`++++`

[DOCS] Updates dataframe transform terminology (#46642) 2019-09-16 11:28:19 -04:00			`Each time a {transform} examines the source indices and creates or`
[DOCS] Adds transform content (#46575) (#46578) 2019-09-11 11:44:03 -04:00			`updates the destination index, it generates a _checkpoint_.`

[DOCS] Updates dataframe transform terminology (#46642) 2019-09-16 11:28:19 -04:00			`If your {transform} runs only once, there is logically only one`
			`checkpoint. If your {transform} runs continuously, however, it creates`
[DOCS] Adds transform content (#46575) (#46578) 2019-09-11 11:44:03 -04:00			`checkpoints as it ingests and transforms new source data.`

[DOCS] Updates dataframe transform terminology (#46642) 2019-09-16 11:28:19 -04:00			`To create a checkpoint, the {ctransform}:`
[DOCS] Adds transform content (#46575) (#46578) 2019-09-11 11:44:03 -04:00
			`. Checks for changes to source indices.`
			`+`
[DOCS] Updates dataframe transform terminology (#46642) 2019-09-16 11:28:19 -04:00			`Using a simple periodic timer, the {transform} checks for changes to`
[DOCS] Adds transform content (#46575) (#46578) 2019-09-11 11:44:03 -04:00			`the source indices. This check is done based on the interval defined in the`
			transform's `frequency` property.
			`+`
			`If the source indices remain unchanged or if a checkpoint is already in progress`
			`then it waits for the next timer.`

			`. Identifies which entities have changed.`
			`+`
[DOCS] Updates dataframe transform terminology (#46642) 2019-09-16 11:28:19 -04:00			`The {transform} searches to see which entities have changed since the`
			last time it checked. The `sync` configuration object in the {transform}
			`identifies a time field in the source indices. The {transform} uses the values`
			`in that field to synchronize the source and destination indices.`
[DOCS] Adds transform content (#46575) (#46578) 2019-09-11 11:44:03 -04:00
			`. Updates the destination index (the {dataframe}) with the changed entities.`
			`+`
			`--`
[DOCS] Updates dataframe transform terminology (#46642) 2019-09-16 11:28:19 -04:00			`The {transform} applies changes related to either new or changed`
[DOCS] Adds transform content (#46575) (#46578) 2019-09-11 11:44:03 -04:00			`entities to the destination index. The set of changed entities is paginated. For`
[DOCS] Updates dataframe transform terminology (#46642) 2019-09-16 11:28:19 -04:00			`each page, the {transform} performs a composite aggregation using a`
[DOCS] Adds transform content (#46575) (#46578) 2019-09-11 11:44:03 -04:00			`terms` query. After all the pages of changes have been applied, the checkpoint
			`is complete.`
			`--`

			`This checkpoint process involves both search and indexing activity on the`
			`cluster. We have attempted to favor control over performance while developing`
[DOCS] Updates dataframe transform terminology (#46642) 2019-09-16 11:28:19 -04:00			`{transforms}. We decided it was preferable for the`
			`{transform} to take longer to complete, rather than to finish quickly`
[DOCS] Adds transform content (#46575) (#46578) 2019-09-11 11:44:03 -04:00			`and take precedence in resource consumption. That being said, the cluster still`
			`requires enough resources to support both the composite aggregation search and`
			`the indexing of its results.`

			`TIP: If the cluster experiences unsuitable performance degradation due to the`
[DOCS] Updates dataframe transform terminology (#46642) 2019-09-16 11:28:19 -04:00			`{transform}, stop the {transform}. Consider whether you can apply a`
			`source query to the {transform} to reduce the scope of data it`
[DOCS] Adds transform content (#46575) (#46578) 2019-09-11 11:44:03 -04:00			`processes. Also consider whether the cluster has sufficient resources in place`
			`to support both the composite aggregation search and the indexing of its`
			`results.`

			`[discrete]`
			`[[ml-transform-checkpoint-errors]]`
			`==== Error handling`

[DOCS] Updates dataframe transform terminology (#46642) 2019-09-16 11:28:19 -04:00			`Failures in {transforms} tend to be related to searching or indexing.`
			`To increase the resiliency of {transforms}, the cursor positions of`
[DOCS] Adds transform content (#46575) (#46578) 2019-09-11 11:44:03 -04:00			`the aggregated search and the changed entities search are tracked in memory and`
			`persisted periodically.`

			`Checkpoint failures can be categorized as follows:`

			`* Temporary failures: The checkpoint is retried. If 10 consecutive failures`
[DOCS] Updates dataframe transform terminology (#46642) 2019-09-16 11:28:19 -04:00			`occur, the {transform} has a failed status. For example, this`
[DOCS] Adds transform content (#46575) (#46578) 2019-09-11 11:44:03 -04:00			`situation might occur when there are shard failures and queries return only`
			`partial results.`
[DOCS] Updates dataframe transform terminology (#46642) 2019-09-16 11:28:19 -04:00			`* Irrecoverable failures: The {transform} immediately fails. For`
[DOCS] Adds transform content (#46575) (#46578) 2019-09-11 11:44:03 -04:00			`example, this situation occurs when the source index is not found.`
[DOCS] Updates dataframe transform terminology (#46642) 2019-09-16 11:28:19 -04:00			`* Adjustment failures: The {transform} retries with adjusted settings.`
[DOCS] Adds transform content (#46575) (#46578) 2019-09-11 11:44:03 -04:00			`For example, if a parent circuit breaker memory errors occur during the`
[DOCS] Updates dataframe transform terminology (#46642) 2019-09-16 11:28:19 -04:00			`composite aggregation, the {transform} receives partial results. The aggregated`
[DOCS] Adds transform content (#46575) (#46578) 2019-09-11 11:44:03 -04:00			`search is retried with a smaller number of buckets. This retry is performed at`
[DOCS] Updates dataframe transform terminology (#46642) 2019-09-16 11:28:19 -04:00			the interval defined in the `frequency` property for the {transform}. If the
			`search is retried to the point where it reaches a minimal number of buckets, an`
[DOCS] Adds transform content (#46575) (#46578) 2019-09-11 11:44:03 -04:00			`irrecoverable failure occurs.`

[DOCS] Updates dataframe transform terminology (#46642) 2019-09-16 11:28:19 -04:00			`If the node running the {transforms} fails, the {transform} restarts`
[DOCS] Adds transform content (#46575) (#46578) 2019-09-11 11:44:03 -04:00			`from the most recent persisted cursor position. This recovery process might`
[DOCS] Updates dataframe transform terminology (#46642) 2019-09-16 11:28:19 -04:00			`repeat some of the work the {transform} had already done, but it ensures data`
[DOCS] Adds transform content (#46575) (#46578) 2019-09-11 11:44:03 -04:00			`consistency.`