Expand docs on force-merge and global ordinals (#44684)
Some small clarifications about force-merging and global ordinals, particularly that global ordinals are cheap on a single-segment index and how this relates to frozen indices. Fixes #41687
This commit is contained in:
parent
ed8f75c990
commit
8516fb0f3b
|
@ -299,13 +299,17 @@ leveraging the query cache.
|
|||
[float]
|
||||
=== Force-merge read-only indices
|
||||
|
||||
Indices that are read-only would benefit from being
|
||||
<<indices-forcemerge,merged down to a single segment>>. This is typically the
|
||||
case with time-based indices: only the index for the current time frame is
|
||||
getting new documents while older indices are read-only.
|
||||
Indices that are read-only may benefit from being <<indices-forcemerge,merged
|
||||
down to a single segment>>. This is typically the case with time-based indices:
|
||||
only the index for the current time frame is getting new documents while older
|
||||
indices are read-only. Shards that have been force-merged into a single segment
|
||||
can use simpler and more efficient data structures to perform searches.
|
||||
|
||||
IMPORTANT: Don't force-merge indices that are still being written to -- leave
|
||||
merging to the background merge process.
|
||||
IMPORTANT: Do not force-merge indices to which you are still writing, or to
|
||||
which you will write again in the future. Instead, rely on the automatic
|
||||
background merge process to perform merges as needed to keep the index running
|
||||
smoothly. If you continue to write to a force-merged index then its performance
|
||||
may become much worse.
|
||||
|
||||
[float]
|
||||
=== Warm up global ordinals
|
||||
|
@ -315,7 +319,8 @@ Global ordinals are a data-structure that is used in order to run
|
|||
<<keyword,`keyword`>> fields. They are loaded lazily in memory because
|
||||
Elasticsearch does not know which fields will be used in `terms` aggregations
|
||||
and which fields won't. You can tell Elasticsearch to load global ordinals
|
||||
eagerly at refresh-time by configuring mappings as described below:
|
||||
eagerly when starting or refreshing a shard by configuring mappings as
|
||||
described below:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
|
|
|
@ -1,19 +1,24 @@
|
|||
[[indices-forcemerge]]
|
||||
=== Force Merge
|
||||
|
||||
The force merge API allows to force merging of one or more indices through an
|
||||
API. The merge relates to the number of segments a Lucene index holds within
|
||||
each shard. The force merge operation allows to reduce the number of segments by
|
||||
merging them.
|
||||
The force merge API allows you to force a <<index-modules-merge,merge>> on the
|
||||
shards of one or more indices. Merging reduces the number of segments in each
|
||||
shard by merging some of them together, and also frees up the space used by
|
||||
deleted documents. Merging normally happens automatically, but sometimes it is
|
||||
useful to trigger a merge manually.
|
||||
|
||||
This call will block until the merge is complete. If the http connection is
|
||||
lost, the request will continue in the background, and any new requests will
|
||||
block until the previous force merge is complete.
|
||||
WARNING: **Force merge should only be called against an index after you have
|
||||
finished writing to it.** Force merge can cause very large (>5GB) segments to
|
||||
be produced, and if you continue to write to such an index then the automatic
|
||||
merge policy will never consider these segments for future merges until they
|
||||
mostly consist of deleted documents. This can cause very large segments to
|
||||
remain in the index which can result in increased disk usage and worse search
|
||||
performance.
|
||||
|
||||
WARNING: Force merge should only be called against *read-only indices*. Running
|
||||
force merge against a read-write index can cause very large segments to be produced
|
||||
(>5Gb per segment), and the merge policy will never consider it for merging again until
|
||||
it mostly consists of deleted docs. This can cause very large segments to remain in the shards.
|
||||
Calls to this API block until the merge is complete. If the client connection
|
||||
is lost before completion then the force merge process will continue in the
|
||||
background. Any new requests to force merge the same indices will also block
|
||||
until the ongoing force merge is complete.
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
|
@ -22,6 +27,22 @@ POST /twitter/_forcemerge
|
|||
// CONSOLE
|
||||
// TEST[setup:twitter]
|
||||
|
||||
Force-merging can be useful with time-based indices and when using
|
||||
<<indices-rollover-index,rollover>>. In these cases each index only receives
|
||||
indexing traffic for a certain period of time, and once an index will receive
|
||||
no more writes its shards can be force-merged down to a single segment:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
POST /logs-000001/_forcemerge?max_num_segments=1
|
||||
--------------------------------------------------
|
||||
// CONSOLE
|
||||
// TEST[setup:twitter]
|
||||
// TEST[s/logs-000001/twitter/]
|
||||
|
||||
This can be a good idea because single-segment shards can sometimes use simpler
|
||||
and more efficient data structures to perform searches.
|
||||
|
||||
[float]
|
||||
[[forcemerge-parameters]]
|
||||
==== Request Parameters
|
||||
|
|
|
@ -30,7 +30,7 @@ efficiently compressed.
|
|||
|
||||
By default, global ordinals are loaded at search-time, which is the right
|
||||
trade-off if you are optimizing for indexing speed. However, if you are more
|
||||
interested in search speed, it could be interesting to set
|
||||
interested in search speed, it could be beneficial to set
|
||||
`eager_global_ordinals: true` on fields that you plan to use in terms
|
||||
aggregations:
|
||||
|
||||
|
@ -49,9 +49,25 @@ PUT my_index/_mapping
|
|||
// CONSOLE
|
||||
// TEST[s/^/PUT my_index\n/]
|
||||
|
||||
This will shift the cost from search-time to refresh-time. Elasticsearch will
|
||||
make sure that global ordinals are built before publishing updates to the
|
||||
content of the index.
|
||||
This will shift the cost of building the global ordinals from search-time to
|
||||
refresh-time. Elasticsearch will make sure that global ordinals are built
|
||||
before exposing to searches any changes to the content of the index.
|
||||
Elasticsearch will also eagerly build global ordinals when starting a new copy
|
||||
of a shard, such as when increasing the number of replicas or when relocating a
|
||||
shard onto a new node.
|
||||
|
||||
If a shard has been <<indices-forcemerge,force-merged>> down to a single
|
||||
segment then its global ordinals are identical to the ordinals for its unique
|
||||
segment, which means there is no extra cost for using global ordinals on such a
|
||||
shard. Note that for performance reasons you should only force-merge an index
|
||||
to which you will never write again.
|
||||
|
||||
On a <<frozen-indices,frozen index>>, global ordinals are discarded after each
|
||||
search and rebuilt again on the next search if needed or if
|
||||
`eager_global_ordinals` is set. This means `eager_global_ordinals` should not
|
||||
be used on frozen indices. Instead, force-merge an index to a single segment
|
||||
before freezing it so that global ordinals need not be built separately on each
|
||||
search.
|
||||
|
||||
If you ever decide that you do not need to run `terms` aggregations on this
|
||||
field anymore, then you can disable eager loading of global ordinals at any
|
||||
|
|
Loading…
Reference in New Issue