2013-08-29 01:24:34 +02:00
|
|
|
[[index-modules-translog]]
|
|
|
|
== Translog
|
|
|
|
|
2015-05-05 21:32:41 +02:00
|
|
|
Changes to a shard are only persisted to disk when the shard is ``flushed'',
|
|
|
|
which is a relatively heavy operation and so cannot be performed after every
|
|
|
|
index or delete operation. Instead, changes are accumulated in an in-memory
|
|
|
|
indexing buffer and only written to disk periodically. This would mean that
|
|
|
|
the contents of the in-memory buffer would be lost in the event of power
|
|
|
|
failure or some other hardware crash.
|
|
|
|
|
|
|
|
To prevent this data loss, each shard has a _transaction log_ or write ahead
|
|
|
|
log associated with it. Any index or delete operation is first written to the
|
|
|
|
translog before being processed by the internal Lucene index. This translog is
|
|
|
|
only cleared once the shard has been flushed and the data in the in-memory
|
|
|
|
buffer persisted to disk as a Lucene segment.
|
|
|
|
|
|
|
|
In the event of a crash, recent transactions can be replayed from the
|
|
|
|
transaction log when the shard recovers.
|
|
|
|
|
|
|
|
[float]
|
|
|
|
=== Flush settings
|
|
|
|
|
|
|
|
The following <<indices-update-settings,dynamically updatable>> settings
|
|
|
|
control how often the in-memory buffer is flushed to disk:
|
|
|
|
|
|
|
|
`index.translog.flush_threshold_size`::
|
|
|
|
|
|
|
|
Once the translog hits this size, a flush will happen. Defaults to `512mb`.
|
2013-08-29 01:24:34 +02:00
|
|
|
|
2014-07-31 14:06:06 +02:00
|
|
|
`index.translog.flush_threshold_ops`::
|
2013-08-29 01:24:34 +02:00
|
|
|
|
2014-07-31 14:06:06 +02:00
|
|
|
After how many operations to flush. Defaults to `unlimited`.
|
2013-08-29 01:24:34 +02:00
|
|
|
|
2015-05-05 21:32:41 +02:00
|
|
|
`index.translog.flush_threshold_period`::
|
2014-01-07 19:58:16 +01:00
|
|
|
|
2015-05-05 21:32:41 +02:00
|
|
|
How long to wait before triggering a flush regardless of translog size. Defaults to `30m`.
|
2014-07-31 14:06:06 +02:00
|
|
|
|
2015-05-05 21:32:41 +02:00
|
|
|
`index.translog.interval`::
|
2014-07-31 14:06:06 +02:00
|
|
|
|
2015-05-05 21:32:41 +02:00
|
|
|
How often to check if a flush is needed, randomized between the interval value
|
|
|
|
and 2x the interval value. Defaults to `5s`.
|
2014-07-31 14:06:06 +02:00
|
|
|
|
2015-05-05 21:32:41 +02:00
|
|
|
[float]
|
|
|
|
=== Translog settings
|
2014-07-31 14:06:06 +02:00
|
|
|
|
2015-05-05 21:32:41 +02:00
|
|
|
The translog itself is only persisted to disk when it is ++fsync++ed. Until
|
|
|
|
then, data recently written to the translog may only exist in the file system
|
|
|
|
cache and could potentially be lost in the event of hardware failure.
|
|
|
|
|
|
|
|
The following <<indices-update-settings,dynamically updatable>> settings
|
|
|
|
control the behaviour of the transaction log:
|
2014-07-31 14:06:06 +02:00
|
|
|
|
2015-03-27 10:18:09 +01:00
|
|
|
`index.translog.sync_interval`::
|
2014-07-31 14:06:06 +02:00
|
|
|
|
|
|
|
How often the translog is ++fsync++ed to disk. Defaults to `5s`.
|
|
|
|
|
2015-05-05 21:32:41 +02:00
|
|
|
`index.translog.fs.type`::
|
|
|
|
|
|
|
|
Either a `buffered` translog (default) which buffers 64kB in memory before
|
|
|
|
writing to disk, or a `simple` translog which writes every entry to disk
|
|
|
|
immediately. Whichever is used, these writes are only ++fsync++ed according
|
|
|
|
to the `sync_interval`.
|
|
|
|
|
|
|
|
The `buffered` translog is written to disk when it reaches 64kB in size, or
|
|
|
|
whenever an `fsync` is triggered by the `sync_interval`.
|
|
|
|
|
|
|
|
.Why don't we `fsync` the translog after every write?
|
|
|
|
******************************************************
|
|
|
|
|
|
|
|
The disk is the slowest part of any server. An `fsync` ensures that data in
|
|
|
|
the file system buffer has been physically written to disk, but this
|
|
|
|
persistence comes with a performance cost.
|
|
|
|
|
|
|
|
However, the translog is not the only persistence mechanism in Elasticsearch.
|
|
|
|
Any index or update request is first written to the primary shard, then
|
|
|
|
forwarded in parallel to any replica shards. The primary waits for the action
|
|
|
|
to be completed on the replicas before returning to success to the client.
|
|
|
|
|
|
|
|
If the node holding the primary shard dies for some reason, its transaction
|
|
|
|
log could be missing the last 5 seconds of data. However, that data should
|
|
|
|
already be available on a replica shard on a different node. Of course, if
|
|
|
|
the whole data centre loses power at the same time, then it is possible that
|
|
|
|
you could lose the last 5 seconds (or `sync_interval`) of data.
|
2013-08-29 01:24:34 +02:00
|
|
|
|
2015-05-05 21:32:41 +02:00
|
|
|
******************************************************
|