mirror of
https://github.com/honeymoose/OpenSearch.git
synced 2025-02-06 04:58:50 +00:00
39e81c3ca6
This commit updates the docs about translog retention and flushing to reflect recent changes in how peer recoveries work. It also adds some docs to describe how history is retained for replay using soft deletes and shard history retention leases. Relates #45473
111 lines
5.1 KiB
Plaintext
111 lines
5.1 KiB
Plaintext
[[index-modules-translog]]
|
|
== Translog
|
|
|
|
Changes to Lucene are only persisted to disk during a Lucene commit, which is a
|
|
relatively expensive operation and so cannot be performed after every index or
|
|
delete operation. Changes that happen after one commit and before another will
|
|
be removed from the index by Lucene in the event of process exit or hardware
|
|
failure.
|
|
|
|
Lucene commits are too expensive to perform on every individual change, so each
|
|
shard copy also writes operations into its _transaction log_ known as the
|
|
_translog_. All index and delete operations are written to the translog after
|
|
being processed by the internal Lucene index but before they are acknowledged.
|
|
In the event of a crash, recent operations that have been acknowledged but not
|
|
yet included in the last Lucene commit are instead recovered from the translog
|
|
when the shard recovers.
|
|
|
|
An {es} <<indices-flush,flush>> is the process of performing a Lucene commit and
|
|
starting a new translog generation. Flushes are performed automatically in the
|
|
background in order to make sure the translog does not grow too large, which
|
|
would make replaying its operations take a considerable amount of time during
|
|
recovery. The ability to perform a flush manually is also exposed through an
|
|
API, although this is rarely needed.
|
|
|
|
[float]
|
|
=== Translog settings
|
|
|
|
The data in the translog is only persisted to disk when the translog is
|
|
++fsync++ed and committed. In the event of a hardware failure or an operating
|
|
system crash or a JVM crash or a shard failure, any data written since the
|
|
previous translog commit will be lost.
|
|
|
|
By default, `index.translog.durability` is set to `request` meaning that
|
|
Elasticsearch will only report success of an index, delete, update, or bulk
|
|
request to the client after the translog has been successfully ++fsync++ed and
|
|
committed on the primary and on every allocated replica. If
|
|
`index.translog.durability` is set to `async` then Elasticsearch ++fsync++s and
|
|
commits the translog only every `index.translog.sync_interval` which means that
|
|
any operations that were performed just before a crash may be lost when the node
|
|
recovers.
|
|
|
|
The following <<indices-update-settings,dynamically updatable>> per-index
|
|
settings control the behaviour of the translog:
|
|
|
|
`index.translog.sync_interval`::
|
|
|
|
How often the translog is ++fsync++ed to disk and committed, regardless of
|
|
write operations. Defaults to `5s`. Values less than `100ms` are not allowed.
|
|
|
|
`index.translog.durability`::
|
|
+
|
|
--
|
|
|
|
Whether or not to `fsync` and commit the translog after every index, delete,
|
|
update, or bulk request. This setting accepts the following parameters:
|
|
|
|
`request`::
|
|
|
|
(default) `fsync` and commit after every request. In the event of hardware
|
|
failure, all acknowledged writes will already have been committed to disk.
|
|
|
|
`async`::
|
|
|
|
`fsync` and commit in the background every `sync_interval`. In
|
|
the event of a failure, all acknowledged writes since the last
|
|
automatic commit will be discarded.
|
|
--
|
|
|
|
`index.translog.flush_threshold_size`::
|
|
|
|
The translog stores all operations that are not yet safely persisted in Lucene
|
|
(i.e., are not part of a Lucene commit point). Although these operations are
|
|
available for reads, they will need to be replayed if the shard was stopped
|
|
and had to be recovered. This setting controls the maximum total size of these
|
|
operations, to prevent recoveries from taking too long. Once the maximum size
|
|
has been reached a flush will happen, generating a new Lucene commit point.
|
|
Defaults to `512mb`.
|
|
|
|
[float]
|
|
[[index-modules-translog-retention]]
|
|
==== Translog retention
|
|
|
|
If an index is not using <<index-modules-history-retention,soft deletes>> to
|
|
retain historical operations then {es} recovers each replica shard by replaying
|
|
operations from the primary's translog. This means it is important for the
|
|
primary to preserve extra operations in its translog in case it needs to
|
|
rebuild a replica. Moreover it is important for each replica to preserve extra
|
|
operations in its translog in case it is promoted to primary and then needs to
|
|
rebuild its own replicas in turn. The following settings control how much
|
|
translog is retained for peer recoveries.
|
|
|
|
`index.translog.retention.size`::
|
|
|
|
This controls the total size of translog files to keep for each shard.
|
|
Keeping more translog files increases the chance of performing an operation
|
|
based sync when recovering a replica. If the translog files are not
|
|
sufficient, replica recovery will fall back to a file based sync. Defaults to
|
|
`512mb`. This setting is ignored, and should not be set, if soft deletes are
|
|
enabled. Soft deletes are enabled by default in indices created in {es}
|
|
versions 7.0.0 and later.
|
|
|
|
`index.translog.retention.age`::
|
|
|
|
This controls the maximum duration for which translog files are kept by each
|
|
shard. Keeping more translog files increases the chance of performing an
|
|
operation based sync when recovering replicas. If the translog files are not
|
|
sufficient, replica recovery will fall back to a file based sync. Defaults to
|
|
`12h`. This setting is ignored, and should not be set, if soft deletes are
|
|
enabled. Soft deletes are enabled by default in indices created in {es}
|
|
versions 7.0.0 and later.
|