Expand docs on force-merge and global ordinals (#44684)

Some small clarifications about force-merging and global ordinals, particularly that global ordinals are cheap on a single-segment index and how this relates to frozen indices. Fixes #41687
2019-07-23 07:32:44 +01:00 · 2019-07-23 07:32:44 +01:00 · 8516fb0f3b
parent ed8f75c990
commit 8516fb0f3b
3 changed files with 64 additions and 22 deletions
--- a/docs/reference/how-to/search-speed.asciidoc
+++ b/docs/reference/how-to/search-speed.asciidoc
@ -299,13 +299,17 @@ leveraging the query cache.
 [float]
 === Force-merge read-only indices

-Indices that are read-only would benefit from being
-<<indices-forcemerge,merged down to a single segment>>. This is typically the
-case with time-based indices: only the index for the current time frame is
-getting new documents while older indices are read-only.
+Indices that are read-only may benefit from being <<indices-forcemerge,merged
+down to a single segment>>. This is typically the case with time-based indices:
+only the index for the current time frame is getting new documents while older
+indices are read-only. Shards that have been force-merged into a single segment
+can use simpler and more efficient data structures to perform searches.

-IMPORTANT: Don't force-merge indices that are still being written to -- leave
-merging to the background merge process.
+IMPORTANT: Do not force-merge indices to which you are still writing, or to
+which you will write again in the future. Instead, rely on the automatic
+background merge process to perform merges as needed to keep the index running
+smoothly. If you continue to write to a force-merged index then its performance
+may become much worse.

 [float]
 === Warm up global ordinals
@ -315,7 +319,8 @@ Global ordinals are a data-structure that is used in order to run
 <<keyword,`keyword`>> fields. They are loaded lazily in memory because
 Elasticsearch does not know which fields will be used in `terms` aggregations
 and which fields won't. You can tell Elasticsearch to load global ordinals
-eagerly at refresh-time by configuring mappings as described below:
+eagerly when starting or refreshing a shard by configuring mappings as
+described below:

 [source,js]
 --------------------------------------------------
--- a/docs/reference/indices/forcemerge.asciidoc
+++ b/docs/reference/indices/forcemerge.asciidoc
@ -1,19 +1,24 @@
 [[indices-forcemerge]]
 === Force Merge

-The force merge API allows to force merging of one or more indices through an
-API. The merge relates to the number of segments a Lucene index holds within
-each shard. The force merge operation allows to reduce the number of segments by
-merging them.
+The force merge API allows you to force a <<index-modules-merge,merge>> on the
+shards of one or more indices. Merging reduces the number of segments in each
+shard by merging some of them together, and also frees up the space used by
+deleted documents. Merging normally happens automatically, but sometimes it is
+useful to trigger a merge manually.

-This call will block until the merge is complete. If the http connection is
-lost, the request will continue in the background, and any new requests will
-block until the previous force merge is complete.
+WARNING: **Force merge should only be called against an index after you have
+finished writing to it.** Force merge can cause very large (>5GB) segments to
+be produced, and if you continue to write to such an index then the automatic
+merge policy will never consider these segments for future merges until they
+mostly consist of deleted documents. This can cause very large segments to
+remain in the index which can result in increased disk usage and worse search
+performance.

-WARNING: Force merge should only be called against *read-only indices*. Running 
-force merge against a read-write index can cause very large segments to be produced 
-(>5Gb per segment), and the merge policy will never consider it for merging again until 
-it mostly consists of deleted docs. This can cause very large segments to remain in the shards.
+Calls to this API block until the merge is complete. If the client connection
+is lost before completion then the force merge process will continue in the
+background. Any new requests to force merge the same indices will also block
+until the ongoing force merge is complete.

 [source,js]
 --------------------------------------------------
@ -22,6 +27,22 @@ POST /twitter/_forcemerge
 // CONSOLE
 // TEST[setup:twitter]

+Force-merging can be useful with time-based indices and when using
+<<indices-rollover-index,rollover>>. In these cases each index only receives
+indexing traffic for a certain period of time, and once an index will receive
+no more writes its shards can be force-merged down to a single segment:
+
+[source,js]
+--------------------------------------------------
+POST /logs-000001/_forcemerge?max_num_segments=1
+--------------------------------------------------
+// CONSOLE
+// TEST[setup:twitter]
+// TEST[s/logs-000001/twitter/]
+
+This can be a good idea because single-segment shards can sometimes use simpler
+and more efficient data structures to perform searches.
+
 [float]
 [[forcemerge-parameters]]
 ==== Request Parameters
--- a/docs/reference/mapping/params/eager-global-ordinals.asciidoc
+++ b/docs/reference/mapping/params/eager-global-ordinals.asciidoc
@ -30,7 +30,7 @@ efficiently compressed.

 By default, global ordinals are loaded at search-time, which is the right
 trade-off if you are optimizing for indexing speed. However, if you are more
-interested in search speed, it could be interesting to set
+interested in search speed, it could be beneficial to set
 `eager_global_ordinals: true` on fields that you plan to use in terms
 aggregations:

@ -49,9 +49,25 @@ PUT my_index/_mapping
 // CONSOLE
 // TEST[s/^/PUT my_index\n/]

-This will shift the cost from search-time to refresh-time. Elasticsearch will
-make sure that global ordinals are built before publishing updates to the
-content of the index.
+This will shift the cost of building the global ordinals from search-time to
+refresh-time. Elasticsearch will make sure that global ordinals are built
+before exposing to searches any changes to the content of the index.
+Elasticsearch will also eagerly build global ordinals when starting a new copy
+of a shard, such as when increasing the number of replicas or when relocating a
+shard onto a new node.
+
+If a shard has been <<indices-forcemerge,force-merged>> down to a single
+segment then its global ordinals are identical to the ordinals for its unique
+segment, which means there is no extra cost for using global ordinals on such a
+shard. Note that for performance reasons you should only force-merge an index
+to which you will never write again.
+
+On a <<frozen-indices,frozen index>>, global ordinals are discarded after each
+search and rebuilt again on the next search if needed or if
+`eager_global_ordinals` is set. This means `eager_global_ordinals` should not
+be used on frozen indices. Instead, force-merge an index to a single segment
+before freezing it so that global ordinals need not be built separately on each
+search.

 If you ever decide that you do not need to run `terms` aggregations on this
 field anymore, then you can disable eager loading of global ordinals at any