Reword frozen indices introduction (#40160)

Clarifies that we don't move anything extra to persistent storage when freezing
an index, and gives a bit more context for less experienced users.
This commit is contained in:
David Turner 2019-03-20 11:56:36 +00:00
parent 235f57989f
commit 522f33e9eb
1 changed files with 33 additions and 12 deletions

View File

@ -5,20 +5,41 @@
[partintro]
--
Elasticsearch indices can require a significant amount of memory available in order to be open and searchable. Yet, not all indices need
to be writable at the same time and have different access patterns over time. For example, indices in the time series or logging use cases
are unlikely to be queried once they age out but still need to be kept around for retention policy purposes.
{es} indices keep some data structures in memory to allow you to search them
efficiently and to index into them. If you have a lot of indices then the
memory required for these data structures can add up to a significant amount.
For indices that are searched frequently it is better to keep these structures
in memory because it takes time to rebuild them. However, you might access some
of your indices so rarely that you would prefer to release the corresponding
memory and rebuild these data structures on each search.
In order to keep indices available and queryable for a longer period but at the same time reduce their hardware requirements they can be transitioned
into a frozen state. Once an index is frozen, all of its transient shard memory (aside from mappings and analyzers)
is moved to persistent storage. This allows for a much higher disk to heap storage ratio on individual nodes. Once an index is
frozen, it is made read-only and drops its transient data structures from memory. These data structures will need to be reloaded on demand (and subsequently dropped) for each search request that targets the frozen index. A search request that hits
one or more frozen shards will be executed on a throttled threadpool that ensures that we never search more than
`N` (`1` by default) searches concurrently (see <<search-throttled>>). This protects nodes from exceeding the available memory due to incoming search requests.
For example, if you are using time-based indices to store log messages or time
series data then it is likely that older indices are searched much less often
than the more recent ones. Older indices also receive no indexing requests.
Furthermore, it is usually the case that searches of older indices are for
performing longer-term analyses for which a slower response is acceptable.
In contrast to ordinary open indices, frozen indices are expected to execute slowly and are not designed for high query load. Parallelism is
gained only on a per-node level and loading data-structures on demand is expected to be one or more orders of a magnitude slower than query
execution on a per shard level. Depending on the data in an index, a frozen index may execute searches in the seconds to minutes range, when the same index in an unfrozen state may execute the same search request in milliseconds.
If you have such indices then they are good candidates for becoming _frozen
indices_. {es} builds the transient data structures of each shard of a frozen
index each time that shard is searched, and discards these data structures as
soon as the search is complete. Because {es} does not maintain these transient
data structures in memory, frozen indices consume much less heap than normal
indices. This allows for a much higher disk-to-heap ratio than would otherwise
be possible.
Searches performed on frozen indices use the small, dedicated,
<<search-throttled,`search_throttled` threadpool>> to control the number of
concurrent searches that hit frozen shards on each node. This limits the amount
of extra memory required for the transient data structures corresponding to
frozen shards, which consequently protects nodes against excessive memory
consumption.
Frozen indices are read-only: you cannot index into them.
Searches on frozen indices are expected to execute slowly. Frozen indices are
not intended for high search load. It is possible that a search of a frozen
index may take seconds or minutes to complete, even if the same searches
completed in milliseconds when the indices were not frozen.
--
== Best Practices