Expand docs on force-merge and global ordinals (#44684)

Some small clarifications about force-merging and global ordinals, particularly
that global ordinals are cheap on a single-segment index and how this relates
to frozen indices.

Fixes #41687
This commit is contained in:
David Turner 2019-07-23 07:32:44 +01:00
parent ed8f75c990
commit 8516fb0f3b
3 changed files with 64 additions and 22 deletions

View File

@ -299,13 +299,17 @@ leveraging the query cache.
[float]
=== Force-merge read-only indices
Indices that are read-only would benefit from being
<<indices-forcemerge,merged down to a single segment>>. This is typically the
case with time-based indices: only the index for the current time frame is
getting new documents while older indices are read-only.
Indices that are read-only may benefit from being <<indices-forcemerge,merged
down to a single segment>>. This is typically the case with time-based indices:
only the index for the current time frame is getting new documents while older
indices are read-only. Shards that have been force-merged into a single segment
can use simpler and more efficient data structures to perform searches.
IMPORTANT: Don't force-merge indices that are still being written to -- leave
merging to the background merge process.
IMPORTANT: Do not force-merge indices to which you are still writing, or to
which you will write again in the future. Instead, rely on the automatic
background merge process to perform merges as needed to keep the index running
smoothly. If you continue to write to a force-merged index then its performance
may become much worse.
[float]
=== Warm up global ordinals
@ -315,7 +319,8 @@ Global ordinals are a data-structure that is used in order to run
<<keyword,`keyword`>> fields. They are loaded lazily in memory because
Elasticsearch does not know which fields will be used in `terms` aggregations
and which fields won't. You can tell Elasticsearch to load global ordinals
eagerly at refresh-time by configuring mappings as described below:
eagerly when starting or refreshing a shard by configuring mappings as
described below:
[source,js]
--------------------------------------------------

View File

@ -1,19 +1,24 @@
[[indices-forcemerge]]
=== Force Merge
The force merge API allows to force merging of one or more indices through an
API. The merge relates to the number of segments a Lucene index holds within
each shard. The force merge operation allows to reduce the number of segments by
merging them.
The force merge API allows you to force a <<index-modules-merge,merge>> on the
shards of one or more indices. Merging reduces the number of segments in each
shard by merging some of them together, and also frees up the space used by
deleted documents. Merging normally happens automatically, but sometimes it is
useful to trigger a merge manually.
This call will block until the merge is complete. If the http connection is
lost, the request will continue in the background, and any new requests will
block until the previous force merge is complete.
WARNING: **Force merge should only be called against an index after you have
finished writing to it.** Force merge can cause very large (>5GB) segments to
be produced, and if you continue to write to such an index then the automatic
merge policy will never consider these segments for future merges until they
mostly consist of deleted documents. This can cause very large segments to
remain in the index which can result in increased disk usage and worse search
performance.
WARNING: Force merge should only be called against *read-only indices*. Running
force merge against a read-write index can cause very large segments to be produced
(>5Gb per segment), and the merge policy will never consider it for merging again until
it mostly consists of deleted docs. This can cause very large segments to remain in the shards.
Calls to this API block until the merge is complete. If the client connection
is lost before completion then the force merge process will continue in the
background. Any new requests to force merge the same indices will also block
until the ongoing force merge is complete.
[source,js]
--------------------------------------------------
@ -22,6 +27,22 @@ POST /twitter/_forcemerge
// CONSOLE
// TEST[setup:twitter]
Force-merging can be useful with time-based indices and when using
<<indices-rollover-index,rollover>>. In these cases each index only receives
indexing traffic for a certain period of time, and once an index will receive
no more writes its shards can be force-merged down to a single segment:
[source,js]
--------------------------------------------------
POST /logs-000001/_forcemerge?max_num_segments=1
--------------------------------------------------
// CONSOLE
// TEST[setup:twitter]
// TEST[s/logs-000001/twitter/]
This can be a good idea because single-segment shards can sometimes use simpler
and more efficient data structures to perform searches.
[float]
[[forcemerge-parameters]]
==== Request Parameters

View File

@ -30,7 +30,7 @@ efficiently compressed.
By default, global ordinals are loaded at search-time, which is the right
trade-off if you are optimizing for indexing speed. However, if you are more
interested in search speed, it could be interesting to set
interested in search speed, it could be beneficial to set
`eager_global_ordinals: true` on fields that you plan to use in terms
aggregations:
@ -49,9 +49,25 @@ PUT my_index/_mapping
// CONSOLE
// TEST[s/^/PUT my_index\n/]
This will shift the cost from search-time to refresh-time. Elasticsearch will
make sure that global ordinals are built before publishing updates to the
content of the index.
This will shift the cost of building the global ordinals from search-time to
refresh-time. Elasticsearch will make sure that global ordinals are built
before exposing to searches any changes to the content of the index.
Elasticsearch will also eagerly build global ordinals when starting a new copy
of a shard, such as when increasing the number of replicas or when relocating a
shard onto a new node.
If a shard has been <<indices-forcemerge,force-merged>> down to a single
segment then its global ordinals are identical to the ordinals for its unique
segment, which means there is no extra cost for using global ordinals on such a
shard. Note that for performance reasons you should only force-merge an index
to which you will never write again.
On a <<frozen-indices,frozen index>>, global ordinals are discarded after each
search and rebuilt again on the next search if needed or if
`eager_global_ordinals` is set. This means `eager_global_ordinals` should not
be used on frozen indices. Instead, force-merge an index to a single segment
before freezing it so that global ordinals need not be built separately on each
search.
If you ever decide that you do not need to run `terms` aggregations on this
field anymore, then you can disable eager loading of global ordinals at any