From 8516fb0f3bb3c49babbfd8fa93fdad3bab9598c1 Mon Sep 17 00:00:00 2001 From: David Turner Date: Tue, 23 Jul 2019 07:32:44 +0100 Subject: [PATCH] Expand docs on force-merge and global ordinals (#44684) Some small clarifications about force-merging and global ordinals, particularly that global ordinals are cheap on a single-segment index and how this relates to frozen indices. Fixes #41687 --- docs/reference/how-to/search-speed.asciidoc | 19 +++++--- docs/reference/indices/forcemerge.asciidoc | 43 ++++++++++++++----- .../params/eager-global-ordinals.asciidoc | 24 +++++++++-- 3 files changed, 64 insertions(+), 22 deletions(-) diff --git a/docs/reference/how-to/search-speed.asciidoc b/docs/reference/how-to/search-speed.asciidoc index e09b18df20c..9c445a148db 100644 --- a/docs/reference/how-to/search-speed.asciidoc +++ b/docs/reference/how-to/search-speed.asciidoc @@ -299,13 +299,17 @@ leveraging the query cache. [float] === Force-merge read-only indices -Indices that are read-only would benefit from being -<>. This is typically the -case with time-based indices: only the index for the current time frame is -getting new documents while older indices are read-only. +Indices that are read-only may benefit from being <>. This is typically the case with time-based indices: +only the index for the current time frame is getting new documents while older +indices are read-only. Shards that have been force-merged into a single segment +can use simpler and more efficient data structures to perform searches. -IMPORTANT: Don't force-merge indices that are still being written to -- leave -merging to the background merge process. +IMPORTANT: Do not force-merge indices to which you are still writing, or to +which you will write again in the future. Instead, rely on the automatic +background merge process to perform merges as needed to keep the index running +smoothly. If you continue to write to a force-merged index then its performance +may become much worse. [float] === Warm up global ordinals @@ -315,7 +319,8 @@ Global ordinals are a data-structure that is used in order to run <> fields. They are loaded lazily in memory because Elasticsearch does not know which fields will be used in `terms` aggregations and which fields won't. You can tell Elasticsearch to load global ordinals -eagerly at refresh-time by configuring mappings as described below: +eagerly when starting or refreshing a shard by configuring mappings as +described below: [source,js] -------------------------------------------------- diff --git a/docs/reference/indices/forcemerge.asciidoc b/docs/reference/indices/forcemerge.asciidoc index 79ffb19614b..f478b5743e2 100644 --- a/docs/reference/indices/forcemerge.asciidoc +++ b/docs/reference/indices/forcemerge.asciidoc @@ -1,19 +1,24 @@ [[indices-forcemerge]] === Force Merge -The force merge API allows to force merging of one or more indices through an -API. The merge relates to the number of segments a Lucene index holds within -each shard. The force merge operation allows to reduce the number of segments by -merging them. +The force merge API allows you to force a <> on the +shards of one or more indices. Merging reduces the number of segments in each +shard by merging some of them together, and also frees up the space used by +deleted documents. Merging normally happens automatically, but sometimes it is +useful to trigger a merge manually. -This call will block until the merge is complete. If the http connection is -lost, the request will continue in the background, and any new requests will -block until the previous force merge is complete. +WARNING: **Force merge should only be called against an index after you have +finished writing to it.** Force merge can cause very large (>5GB) segments to +be produced, and if you continue to write to such an index then the automatic +merge policy will never consider these segments for future merges until they +mostly consist of deleted documents. This can cause very large segments to +remain in the index which can result in increased disk usage and worse search +performance. -WARNING: Force merge should only be called against *read-only indices*. Running -force merge against a read-write index can cause very large segments to be produced -(>5Gb per segment), and the merge policy will never consider it for merging again until -it mostly consists of deleted docs. This can cause very large segments to remain in the shards. +Calls to this API block until the merge is complete. If the client connection +is lost before completion then the force merge process will continue in the +background. Any new requests to force merge the same indices will also block +until the ongoing force merge is complete. [source,js] -------------------------------------------------- @@ -22,6 +27,22 @@ POST /twitter/_forcemerge // CONSOLE // TEST[setup:twitter] +Force-merging can be useful with time-based indices and when using +<>. In these cases each index only receives +indexing traffic for a certain period of time, and once an index will receive +no more writes its shards can be force-merged down to a single segment: + +[source,js] +-------------------------------------------------- +POST /logs-000001/_forcemerge?max_num_segments=1 +-------------------------------------------------- +// CONSOLE +// TEST[setup:twitter] +// TEST[s/logs-000001/twitter/] + +This can be a good idea because single-segment shards can sometimes use simpler +and more efficient data structures to perform searches. + [float] [[forcemerge-parameters]] ==== Request Parameters diff --git a/docs/reference/mapping/params/eager-global-ordinals.asciidoc b/docs/reference/mapping/params/eager-global-ordinals.asciidoc index 8973be95112..162049ec132 100644 --- a/docs/reference/mapping/params/eager-global-ordinals.asciidoc +++ b/docs/reference/mapping/params/eager-global-ordinals.asciidoc @@ -30,7 +30,7 @@ efficiently compressed. By default, global ordinals are loaded at search-time, which is the right trade-off if you are optimizing for indexing speed. However, if you are more -interested in search speed, it could be interesting to set +interested in search speed, it could be beneficial to set `eager_global_ordinals: true` on fields that you plan to use in terms aggregations: @@ -49,9 +49,25 @@ PUT my_index/_mapping // CONSOLE // TEST[s/^/PUT my_index\n/] -This will shift the cost from search-time to refresh-time. Elasticsearch will -make sure that global ordinals are built before publishing updates to the -content of the index. +This will shift the cost of building the global ordinals from search-time to +refresh-time. Elasticsearch will make sure that global ordinals are built +before exposing to searches any changes to the content of the index. +Elasticsearch will also eagerly build global ordinals when starting a new copy +of a shard, such as when increasing the number of replicas or when relocating a +shard onto a new node. + +If a shard has been <> down to a single +segment then its global ordinals are identical to the ordinals for its unique +segment, which means there is no extra cost for using global ordinals on such a +shard. Note that for performance reasons you should only force-merge an index +to which you will never write again. + +On a <>, global ordinals are discarded after each +search and rebuilt again on the next search if needed or if +`eager_global_ordinals` is set. This means `eager_global_ordinals` should not +be used on frozen indices. Instead, force-merge an index to a single segment +before freezing it so that global ordinals need not be built separately on each +search. If you ever decide that you do not need to run `terms` aggregations on this field anymore, then you can disable eager loading of global ordinals at any