Clarify the resiliency trade-off of disabling replicas to speed up indexing. (#52714)

We should be more explicit about the downsides of disabling replicas and explain that users should be ready to re-do the entire load in case of issues mid-way.
2025-02-28 07:59:10 +00:00 · 2020-02-25 08:52:33 +01:00 · 2020-02-25 08:52:33 +01:00 · 9b0ddc1c03
commit 9b0ddc1c03
parent 5ce66b8b3c
1 changed files with 11 additions and 8 deletions
--- a/docs/reference/how-to/indexing-speed.asciidoc
+++ b/docs/reference/how-to/indexing-speed.asciidoc
@ -58,15 +58,18 @@ gets indexed and when it becomes visible, increasing the
 `30s`, might help improve indexing speed.

 [float]
-=== Disable refresh and replicas for initial loads
+=== Disable replicas for initial loads

-If you need to load a large amount of data at once, you should disable refresh
-by setting `index.refresh_interval` to `-1` and set `index.number_of_replicas`
-to `0`. This will temporarily put your index at risk since the loss of any shard
-will cause data loss, but at the same time indexing will be faster since
-documents will be indexed only once. Once the initial loading is finished, you
-can set `index.refresh_interval` and `index.number_of_replicas` back to their
-original values.
+If you have a large amount of data that you want to load all at once into
+Elasticsearch, it may be beneficial to set `index.number_of_replicas` to `0` in
+order to speep up indexing. Having no replicas means that losing a single node
+may incur data loss, so it is important that the data lives elsewhere so that
+this initial load can be retried in case of an issue. Once the initial load is
+finished, you can set `index.number_of_replicas` back to its original value.
+
+If `index.refresh_interval` is configured in the index settings, it may further
+help to unset it during this initial load and setting it back to its original
+value once the initial load is finished.

 [float]
 === Disable swapping