Clarify the resiliency trade-off of disabling replicas to speed up indexing. (#52714)
We should be more explicit about the downsides of disabling replicas and explain that users should be ready to re-do the entire load in case of issues mid-way.
This commit is contained in:
parent
5ce66b8b3c
commit
9b0ddc1c03
|
@ -58,15 +58,18 @@ gets indexed and when it becomes visible, increasing the
|
|||
`30s`, might help improve indexing speed.
|
||||
|
||||
[float]
|
||||
=== Disable refresh and replicas for initial loads
|
||||
=== Disable replicas for initial loads
|
||||
|
||||
If you need to load a large amount of data at once, you should disable refresh
|
||||
by setting `index.refresh_interval` to `-1` and set `index.number_of_replicas`
|
||||
to `0`. This will temporarily put your index at risk since the loss of any shard
|
||||
will cause data loss, but at the same time indexing will be faster since
|
||||
documents will be indexed only once. Once the initial loading is finished, you
|
||||
can set `index.refresh_interval` and `index.number_of_replicas` back to their
|
||||
original values.
|
||||
If you have a large amount of data that you want to load all at once into
|
||||
Elasticsearch, it may be beneficial to set `index.number_of_replicas` to `0` in
|
||||
order to speep up indexing. Having no replicas means that losing a single node
|
||||
may incur data loss, so it is important that the data lives elsewhere so that
|
||||
this initial load can be retried in case of an issue. Once the initial load is
|
||||
finished, you can set `index.number_of_replicas` back to its original value.
|
||||
|
||||
If `index.refresh_interval` is configured in the index settings, it may further
|
||||
help to unset it during this initial load and setting it back to its original
|
||||
value once the initial load is finished.
|
||||
|
||||
[float]
|
||||
=== Disable swapping
|
||||
|
|
Loading…
Reference in New Issue